Enzyme Nomenclature

KEGG ENZYME is an implementation of the Enzyme Nomenclature (EC number) produced by the IUBMB/IUPAC Biochemical Nomenclature Committee. KEGG ENZYME is based on the ExplorEnz database at Trinity College Dublin, and is maintained in the KEGG LIGAND relational database with additional annotation of reaction hierarchy and sequence data links.

EC numbers are not primary identifiers in KEGG

When the KEGG database was initiated in 1995, the EC number was used as a primary identifier for reconstructing the metabolic pathway from the complete genome. This was accomplished by drawing metabolic pathway maps with EC number nodes and by annotating enzyme genes in the genome with EC numbers. By 1999 ortholog IDs were introduced to replace EC numbers and to cover both metabolic and non-metabolic pathways. Ortholog IDs became KOs in 2002. Currently EC numbers are treated as attributes of KOs. Reverse links from EC numbers to K numbers (KO identifiers) are shown in the following Brite hierarchy file. Since EC numbers are given based on the experiments with specific proteins in specific organisms, it is necessary to examine if they can be extended to other proteins in other organisms. This extension is done through KOs. Furthermore, there are cases where multiple EC numbers are mapped to the same KO or a single EC number should be split into multiple sequence similarity groups (which often represent multiple organism groups), such as shown in the following examples.

glucose-6-phosphate dehydrogenase
K19243 K00036
L-arabinose 1-dehydrogenase -
? - K13873

EC numbers are manually linked to sequence information

The EC numbers in the Enzyme Nomenclature list are given by the IUBMB Nomenclature Committee (formerly, Enzyme Commission) based on published experimental data on enzymatic reactions. Unfortunately, however, the Enzyme Nomenclature list does not contain amino acid sequence information of the enzymes used in the experiments. Nevertheless it is a common practice to assign EC numbers in the sequence databases, including KEGG, which needs to be better founded.

Thus, efforts are being made to identify protein sequences used in the orginal experiments based on the refereneces given in the Enzyme Nomenclature list provided by the ExplorEnz database. For example, the following table shows the references and sequence data corresponding to the EC numbers shown above.

Note that the organism code "ag" represents the new KEGG GENES addendum category, which is an individual protein-based and publication-based collection of functionally characterized proteins. There are many reactions that had been characterized before sequencing data became available, including the EC number "" shown above.

The following is a list of currently identified sequences. It contains the following sequence data:
  • GenBank - author submitted nucleotide sequence data (whenever available)
  • NCBI Protein - enzyme sequence data in this submission (whenever available)
  • KEGG GENES - identified enzyme sequence data
which are all manually identified, and
  • PDB - enzyme 3D structure data
which are computationally generated by matching PubMed IDs.

Last updated: June 1, 2016
