Voc (Virus ortholog cluster) can be searched with protein name, virus name, virus family name, Baltimore class and K number. Try, for eacmple, "RNA polymerase" or "RNA polymerase -ssRNA"
Current statistics (2023/12/1)
Number of viral genes (vg entries) | 688,927 |
Number of viral mature peptides (vp entries) | 377 |
Number of vg/vp entries with assigned KOs | 46,629 |
Number of KOs assigned to vg/vp entries | 1,409 |
Number of virus-specific KOs | 984 |
Voc version 2023-10-16 taken from RefSeq 220 (Sep 2023)
Voc threshold | 30% | 50% | 70% |
Number of clusters | 50,667 | 76,305 | 87,391 |
Number of proteins in clusters | 605,698 | 551,181 | 494,772 |
Total number of proteins | 676,637 |
KEGG Virus Resource
KEGG Virus is an interface to virus data that are part of the GENES, KO, GENOME, BRITE, PATHWAY, NETWORK, DISEASE and DRUG databases in KEGG. It also contains a virus-specific dataset of virus ortholog clusters.
Virus genes and genomes
The virus category of the GENES and GENOME databases is generated from the bimonthly release of the NCBI RefSeq database. RefSeq GeneID's are used as gene identifiers and the organism code "vg" (and T40000 category identifier) is used for the entire set of virus genes. Thus, each virus gene in KEGG is identified by vg:<gene_id> where <gene_id> is the NCBI GeneID.
In order to distinguish individual viruses, the GENOME database contains two types of identifiers. One is the Vtax identifier, which is the same as the NCBI taxonomy ID, as shown in the following taxonomy files.
In order to distinguish individual viruses, the GENOME database contains two types of identifiers. One is the Vtax identifier, which is the same as the NCBI taxonomy ID, as shown in the following taxonomy files.
- 08620 KEGG viruses in the NCBI taxonomy
- 08621 KEGG viruses in taxonomic ranks (fixed levels of taxonomic ranks taken from 08620)
- 08622 KEGG selected viruses
Virus KOs
Based on experimental evidence virus specific KOs are defined as summarized in the following BRITE hierarchy file.
- 03200 Viral proteins
KO | Name | Sequence with experimental evidence |
K25001 | SARS coronavirus spike protein S1 | vp:43740568-1 |
K25002 | SARS coronavirus spike protein S2 | vp:43740568-2 |
K24152 | SARS coronavirus spike glycoprotein | vg:43740568 |
K24324 | MERS coronavirus spike glycoprotein | vg:14254594 |
K24325 | Betacoronavirus (excluding SARS and MERS) spike glycoprotein | vg:39105218 |
K19254 | Coronaviridae (excluding betacoronavirus) spike glycoprotein | vg:918758 |
Virus ortholog cluster (Voc)
Due to the lack of experimental evidence, defining and assigning KOs for virus genes will be very limited. The KO assignment rate is less than 5% in viruses, while it is over 50% in cellular organisms.
In order to supplement KOs, a new attempt has been initiated to computationally generate virus ortholog clusters using the same annotation resource of GFIT tables (see KO assignment tools).
Currently, virus ortholog clusters are generated by a simple procedure shown below.
Voc has also been integrated in the keyword search tool shown above, the taxonomy mapping tool linked from the Voc page and the KEGG Genome Browser for searching conserved gene orders in viral genomes.
- The result of vg-vg comparison is stored in the the paralog GFIT table, a variant form of which may be viewed from the "Paralog" button of each vg entry page.
- The measure of similarity is defined by a modified identity with weighting of min(1, overlap*2/(aalen1+aalen2)) for the identity of the overlap (aligned) region.
- For each gene its GFIT table is used to collect similar genes above a given threshold of modified identity.
- GFIT tables of similar genes are then used to collect additional similar genes and this process is repeated until no addition is made.
- This can be viewed as single-linkage clustering of truncated GFIT tables, which are processed in the order of the decreasing table size.
Voc has also been integrated in the keyword search tool shown above, the taxonomy mapping tool linked from the Voc page and the KEGG Genome Browser for searching conserved gene orders in viral genomes.
KEGG Genome Browser for viruses
KEGG Genome Browser is now available for viral genomes and is linked from the "Genome browser" button in the Position field of each vg entry page such as the following:
Here a viral genome is defined by the Vtax identifer (NCBI taxonomy ID) of the GENOME database. When multiple RefSeq (NC number) entries are associated to the same taxonomy ID, each can be selected from the Chr menu.
Virus taxonomy
All the viruses present in KEGG GENES are classified according to the NCBI taxonomy, which is based on the ICTV (International Committee on Taxonomy of Viruses) classification, supplemented by KEGG with the traditional Baltimore classification (see br08620). The correspondence between the ICTV realm, kingdom, phylum, class, order, family classification and the seven types of Baltimore classification is shown below.
Riboviria Pararnavirae Artverviricota Revtraviricetes Blubervirales (VII dsDNA-RT) Ortervirales (VI ssRNA-RT) Caulimoviridae (VII dsDNA-RT) Orthornavirae Duplornaviricota (III dsRNA) Pisuviricota Duplopiviricetes (III dsRNA) Durnavirales Hypoviridae (IV +ssRNA) Pisoniviricetes (IV +ssRNA) Stelpaviricetes (IV +ssRNA) Kitrinoviricota (IV +ssRNA) Lenarviricota (IV +ssRNA) Negarnaviricota (V -ssRNA) Ribozyviria (V -ssRNA) |
Duplodnaviria (I dsDNA) Varidnaviria (I dsDNA) Adnaviria (I dsDNA) Monodnaviria (II ssDNA) Shotokuvirae Cossaviricota Papovaviricetes (I dsDNA) |
Brite hierarchies and tables for viruses
Virus specific Brite hierarchy files and Brite table files are being developed.
Category | Brite file |
Functional classification | 03200 Viral proteins (well-defined viral KOs) |
03210 Viral fusion proteins | |
Taxonomy | 08620 KEGG viruses in the NCBI taxonomy (ICTV and Baltimore classifications) |
08621 KEGG viruses in taxonomic ranks (see KEGG Taxonomy) | |
08622 KEGG selected viruses (T4 genomes of viral pathogens) | |
Virus-cell interaction | 03220 Virus entry |
03222 Virus entry: animal viruses | |
03223 Viral network perturbations | |
Disease | 08401 Infectious diseases (contains viral infections) |
Drug | 08307 Antimicrobials (contains antivirals and targets) |
Comparison with cellular organsms |
01611 RNA polymerase |
01612 DNA polymerase |
Pathways and networks for viruses
The pathway maps for viral infections are interaction networks of both human proteins (colored in green and linked to human gene entries) and viral proteins (colored in blue and linked to virus KOs).
Reference
- Jin, Z., Sato, Y., Kawashima, M., and Kanehisa, M.; KEGG tools for classification and analysis viral proteins. Protein Sci. 33 (2024). [pubmed] [doi]
Last updated: November 1, 2023