KO Database of Molecular Functions

The KO (KEGG Orthology) database is a database of molecular functions represented in terms of functional orthologs. A functional ortholog is manually defined in the context of KEGG molecular networks, namely, KEGG pathway maps, BRITE hierarchies and KEGG modules, and is given a KO identifier called K number. Most KOs are defined from experimentally characterized genes and proteins in specific organisms, which are then generalized to other organisms based on sequence similarity. The granularity of "function" is context-dependent, and the resulting KO grouping may correspond to a group of highly similar sequences within a limited organism group or it may be a more divergent group.

The term KO system is used for a network-based classification of KOs shown below:
00001 KEGG Orthology (KO)
It consists of six top categories (09100 to 09160) for KEGG pathway maps and one top category (09180) for BRITE hierarchies, as well as one top category (09190) for those KOs that are not yet included in either of them. The category numbers for these top categories and the second-level categories under metabolism (09101 to 09112) are used to define color coding of functions (see KEGG Color Codes).

Efforts have been made to associate KO entries with pulication records reporting experimental evidence of functionally characterized sequence data as shown in the SEQUENCE field of the KO entry page. In many cases such data are not available for genes and proteins in the KEGG organisms of completely sequenced genomes. Thus, the addendum (ag) category was introduced in the GENES database enabling functionally characterized individual protein sequences to be included in KEGG. As a byproduct of these efforts, sequence data have also been associated with EC numbers in Enzyme Nomenclature.

Genome Annotation in KEGG

Genome annotation in KEGG contains two unique aspects, KO assignment and KEGG mapping, as summarized below.

KO assignment
  • Molecular functions are stored in the KO (KEGG Orthology) database containing orthologs of experimentally characterized genes/proteins.
  • Genome annotation in KEGG is to assign KO identifiers (or K numbers) to individual genes in the genome, rather than giving text description of functions.
KEGG mapping
  • Cellular and organism-level functions are stored in the PATHWAY, BRITE and MODULE databases in terms of the molecular networks, which are all created as networks of K number nodes.
  • The KO assignment procedure converts a gene set in the genome to a K number set and leads to automatic reconstruction of KEGG pathways and other networks by the process called KEGG mapping, enabling interpretation of high-level functions.

KO Assignment Tools

Two computational tools have been developed for internal annotation of the GENES database, the previous KOALA (KEGG Orthology And Links Annotation) tool and the newly developed KoAnn (KO Annotation) tool. Both tools process GFIT tables generated from the SSDB database of SSEARCH computation results for all pairwise genome comparisons. Currently, both automatic and manual versions of KoAnn are used for all annotations. BlastKOALA is a web server for automatic KO assignment using the KoAnn algorithm for BLAST search against a limited set of GENES data.

KOALA / KoAnn BlastKOALA
Purpose Internal GENES annotation Outside service of genome annotation
Search program SSEARCH BLASTP
Scoring Weighted sum of SW scores (KOALA scoring)
or identity scores (KoAnn scoring)
Weighted sum of BLAST bit scores
(KoAnn scoring)
Database Entire GENES database sequences KEGG Reference genomes and
functionally characterized seuences
linked from KO references

KOALA scoring includes: SW (Smith-Waterman) score, best-best flag, overlap of alignment, ratio of query and DB sequences, taxonomic category and Pfam domains.
KoAnn scoring includes: identify score, sequence length and best-best flag.


Reference
  1. Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M., and Tanabe, M.; KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457-D462 (2016). [pubmed]
  2. Kanehisa, M., Sato, Y., and Morishima, K.; BlastKOALA and GhostKOALA: KEGG tools for functional characterization of genome and metagenome sequences. J. Mol. Biol. 428, 726-731 (2016). [pubmed]

Last updated: September 1, 2024