Current statistics (2024/4/26)
Number of viral genes (vg entries) 688,823
Number of viral mature peptides (vp entries) 377
Number of vg/vp entries with assigned KOs 49,609
Number of KOs assigned to vg/vp entries 1,572
Number of virus-specific KOs 1,147
Vog version 2024-03-29 taken from RefSeq 223 (Mar 2024)
Vog threshold 30% 50% 70%
Number of groups 50,667 76,301 87,377
Number of proteins in groups 605,594 551,101 494,713
Total number of proteins 676,533

Voc (Virus ortholog cluster) was renamed to Vog (Virus ortholog group)

KEGG Virus Resource

KEGG Virus is an interface to virus data that are part of the GENES, KO, GENOME, BRITE, PATHWAY, NETWORK, DISEASE and DRUG databases in KEGG. It also contains a virus-specific dataset of virus ortholog groups.


Virus genes and genomes

The virus category of the GENES and GENOME databases is generated from the bimonthly release of the NCBI RefSeq database. RefSeq GeneID's are used as gene identifiers and the organism code "vg" (and T40000 category identifier) is used for the entire set of virus genes. Thus, each virus gene in KEGG is identified by vg:<gene_id> where <gene_id> is the NCBI GeneID.

In order to distinguish individual viruses, the GENOME database contains two types of identifiers. One is the Vtax identifier, which is the same as the NCBI taxonomy ID, as shown in the following taxonomy files. The Vtax identifiers cover the entire set of viruses, but they may change when NCBI changes the taxonomy IDs. Thus, the T4 identifiers are given for selected viruses with relevance to human diseases or plant diseases as summarized below. Note that a Vtax entry may correspond to multiple virus genome sequences in the RefSeq database. Note also that the specific identifier T40000 is not part of the GENOME database; it simply represents the virus category (vg).

Virus KOs

Based on experimental evidence virus specific KOs are defined as summarized in the following BRITE hierarchy file. Most virus KOs are defined at the family or genus level, but sometimes at the species level as well. Furthermore, some KOs are defined for the "vp" category of the GENES database, which contains mature peptides cleaved from polyproteins for selected viruses. Such distictions are made to better represent functional orthologs in the pathway maps and other protein interaction networks.
KO Name Sequence with
experimental evidence
K25001 SARS coronavirus spike protein S1 vp:43740568-1
K25002 SARS coronavirus spike protein S2 vp:43740568-2
K24152 SARS coronavirus spike glycoprotein vg:43740568
K24324 MERS coronavirus spike glycoprotein vg:14254594
K24325 Betacoronavirus (excluding SARS and MERS) spike glycoprotein vg:39105218
K19254 Coronaviridae (excluding betacoronavirus) spike glycoprotein vg:918758

Virus ortholog group (Vog)

Due to the lack of experimental evidence, defining and assigning KOs for virus genes will be very limited. The KO assignment rate is less than 5% in viruses, while it is over 50% in cellular organisms. In order to supplement KOs, a new attempt has been initiated to computationally generate virus ortholog groups (formerly called virus ortholog clusters) using the same annotation resource of GFIT tables (see KO assignment tools). Currently, virus ortholog groups are generated by a simple procedure shown below.
  1. The result of vg-vg comparison is stored in the the paralog GFIT table, a variant form of which may be viewed from the "Paralog" button of each vg entry page.
  2. The measure of similarity is defined by a modified identity with weighting of min(1, overlap*2/(aalen1+aalen2)) for the identity of the overlap (aligned) region.
  3. For each gene its GFIT table is used to collect similar genes above a given threshold of modified identity.
  4. GFIT tables of similar genes are then used to collect additional similar genes and this process is repeated until no addition is made.
  5. This can be viewed as single-linkage clustering of truncated GFIT tables, which are processed in the order of the decreasing table size.
Virus ortholog group (Vog) data are shown in the Vog page linked from the "Vog" button of each vg entry page such as the following: Note that the Vog number identifiers starting with 3, 5 and 7 for the threshold values of 30%, 50% and 70%, respectively, are not stable identifiers and will change when the original RefSeq data are updated.

Vog has also been integrated in the keyword search tool shown above, the taxonomy mapping tool linked from the Vog page and the KEGG Genome Browser for searching conserved gene orders in viral genomes.

KEGG Genome Browser for viruses

KEGG Genome Browser is now available for viral genomes and is linked from the "Genome browser" button in the Position field of each vg entry page such as the following: Here a viral genome is defined by the Vtax identifer (NCBI taxonomy ID) of the GENOME database. When multiple RefSeq (NC number) entries are associated to the same taxonomy ID, each can be selected from the Chr menu.

Virus taxonomy

All the viruses present in KEGG GENES are classified according to the NCBI taxonomy, which is based on the ICTV (International Committee on Taxonomy of Viruses) classification, supplemented by KEGG with the traditional Baltimore classification (see br08620). The correspondence between the ICTV realm, kingdom, phylum, class, order, family classification and the seven types of Baltimore classification is shown below.
 Riboviria
   Pararnavirae
     Artverviricota
       Revtraviricetes
         Blubervirales  (VII dsDNA-RT) 
         Ortervirales   (VI ssRNA-RT)
           Caulimoviridae (VII dsDNA-RT)
   Orthornavirae
     Duplornaviricota   (III dsRNA)
     Pisuviricota
       Duplopiviricetes (III dsRNA)
         Durnavirales
           Hypoviridae    (IV +ssRNA)
       Pisoniviricetes  (IV +ssRNA)
       Stelpaviricetes  (IV +ssRNA)
     Kitrinoviricota    (IV +ssRNA)
     Lenarviricota      (IV +ssRNA)
     Negarnaviricota    (V -ssRNA)
 
 Ribozyviria            (V -ssRNA)
 Duplodnaviria       (I dsDNA)
 
 Varidnaviria        (I dsDNA)
 
 Adnaviria           (I dsDNA)
 
 Monodnaviria        (II ssDNA) 
   Shotokuvirae
     Cossaviricota
       Papovaviricetes (I dsDNA)

Brite hierarchies and tables for viruses

Virus specific Brite hierarchy files and Brite table files are being developed.

Category Brite file
Functional classification 03200 Viral proteins (well-defined viral KOs)
03210 Viral fusion proteins
Taxonomy 08620 KEGG viruses in the NCBI taxonomy (ICTV and Baltimore classifications)
08621 KEGG viruses in taxonomic ranks (see KEGG Taxonomy)
Virus information 08622 KEGG selected viruses (T4 genomes of viral pathogens)
08623 Viruses in pathway maps
Virus-cell interaction 03220 Virus entry
03222 Virus entry: animal viruses
03223 Viral network perturbations
Disease 08401 Infectious diseases (contains viral infections)
Drug 08307 Antimicrobials (contains antivirals and targets)
Comparison with
cellular organsms
01611 RNA polymerase
01612 DNA polymerase

Pathways and networks for viruses

The pathway maps for viral infections are interaction networks of both human proteins (colored in green and linked to human gene entries) and viral proteins (colored in blue and linked to virus KOs).



Reference
  1. Jin, Z., Sato, Y., Kawashima, M., and Kanehisa, M.; KEGG tools for classification and analysis of viral proteins. Protein Sci. 32, e4820 (2023). [pubmed] [doi]   Note that Voc (Virus ortholog cluster) described in the paper is now called Vog (Virus ortholog group)

Last updated: March 29, 2024