KEGG Mapping

KEGG mapping as a set operation

KEGG mapping is the process to map elementary datasets (genes, proteins, small molecules, etc.) to network datasets (KEGG pathway maps, BRITE functional hierarchies, and KEGG modules). It is not simply an enrichment process; rather it is a set operation to generate a new set. From the beginning of the KEGG project, the basic idea was to automatically generate organism-specific pathways by the set operation between manually annotated genome data and manually created pathway maps. Thus, the KEGG mapping set operation has played a role to extend the KEGG knowledge base. In addition, it played another important role to assist integration and interpretation of users' datasets, especially large-scale datasets generated by high-throughput technologies (see: KEGG Mapper tools). Currently there are four types of mapping operations available in KEGG:
  1. pathway mapping
  2. brite mapping
  3. module mapping
  4. taxonomy mapping
The fourth type may involve molecular or non-molecular datasets (orthologs, modules, and organisms) and the network dataset (taxonomic tree).

Knowledge base extension

Here is a summary of how the knowledge base of KEGG pathway maps and BRITE functional hierarchies is extended by the KEGG mapping set operations.

1. Organism-specific pathway/brite/module datasets
Organism-specific versions are created for KEGG pathway maps, BRITE functional hierarchies, and KEGG modules through the KEGG Orthology (KO) system, either as static files in the daily database update procedure (for the well-annotated genomes in KEGG GENES) or as temporary files on the fly (for GhostKOALA-annotated MGENES). The organism-specific pathway maps and module maps are colored in green, which is a KEGG convention.

2. Human pathway/brite datasets with disease genes and drug targets
On top of the human pathway maps and BRITE functional hierarchies, all known diseases genes accumulaed in KEGG DISEASE and all known drug targets accumulated in KEGG DRUG are mapped and displayed in pink and light blue, respectively. The static pathway maps and BRITE hierarchies are identified by the special organism code "hsadd" and the extension "_dd", respectivey, enabling to be viewed and searched in a similar way as all other KEGG organisms.

3. Disease dataset with gene/genome information
On top of the BRITE hierarchy files for disease classifications, additional information is computationally included in an additional column, namely, human disease classification with known disease genes (extension "_gene") and infectious disease classification with known pathogen genomes (extension "_genome").

4. Drug dataset with molecular network information
On top of the BRITE hierarchy files for various drug classifications, additional information is computationally included either in an additional column or an additional hierarchy level. Examples include drug classifications with metabolizing enzyme data (extension "_enzyme") or target data (extension "_target").

5. Drug labels integrated into KEGG
The BRITE hierarchy files for drug classifications may also be used to integrated drug labels (package inserts), once they are properly linked to KEGG DRUG D numbers. Examples include drug labels from Japan (extension "_japic" or "_yj") and from the USA (extension "_ndc").

Last updated: October 7, 2013