How KEGG can be used for genome/metagenome annotation

The KEGG database contains three main components for genome/metagenome annotation:
  • the collection of internally annotated gene catalogs for the complete genomes (called KEGG organisms) and additional protein sequences in the KEGG GENES database
  • the knowledge base of high-level functions represented as the molecular interaction, reaction and relation networks in the KEGG PATHWAY, BRITE and MODULE databases, and
  • the knowledge base of molecular-level functions associated with ortholog groups in the KO database, where most KO entries are defined in a context-dependent manner as nodes of the KEGG molecular networks.
In general, KO entries (identified by K numbers) also represent sequence similarity groups. Thus, the sequence similarity search of a query genome against KEGG GENES is a search for most appropriate K numbers, and the assigned set of K numbers can be used to reconstruct KEGG pathway maps, BRITE hierarchies and KEGG modules, enabling interpretation of high-level functions (see BlastKOALA and GhostKOALA as examples).

Continuous efforts are being made to expand the repertoire of KO entries and the improved collection of KEGG modules (identified by M numbers) for automated interpretation of high-level funtions.

Ortholog Table

The ortholog table (OT) has existed from the beginning of the KEGG project. For a given set of K numbers it displays the current assignment of genes in KEGG organisms.

Enter K numbers      (Example) K14579 K14580 K14578 K14581 K14582 K14583 K18242 K18243

Taxonomic Distribution

The taxonomic distribution of each KO or module can be viewed from its entry page through the "Taxonomy" or "Virus taxonomy" button linked to the NCBI taxonomy for cellular organsims (br08611) or viruses (br08621).

The taxonomic distribution of a combination of KOs and modules in cellular organisms can be examined with the taxonomy mapping tool in the KEGG Taxonomy page.

Module Table

The module table (MT) is another way of showing the taxonomic distribution of a combination of modules and KOs. For a given set of M and/or K numbers it identifies organisms that contain complete modules and/or KO groups. The list of organisms may be collapsed into broader organism groups.

Enter M/K numbers      (Example) M00595 K16952 M00596

Annotation Guide

The KEGG Annotation Guide is a collection of HTML tables, called BRITE tables, showing summary views of the current annotation of the KEGG GENES database, such as how K numbers are defined and assigned for distinguishing related genes and for comparing different subunit structures.

Comparing subunit structures or gene sets Distinguishing related genes

Signature KOs and Modules

Another set of BRITE tables contains signature KOs and/or signature modules, which can be used to infer phenotypic features of organisms.

Metabolic capacity Pathogenicity and drug resistance

Last updated: July 1, 2022