KEGG ENZYME Database

Enzyme Nomenclature

KEGG ENZYME is an implementation of the Enzyme Nomenclature (EC number system) produced by the IUBMB/IUPAC Biochemical Nomenclature Committee. KEGG ENZYME is based on the ExplorEnz database at Trinity College Dublin, and is maintained in the KEGG relational database with additional annotation of reaction hierarchy and sequence data links.

EC numbers are not primary identifiers in KEGG

When the KEGG database was initiated in 1995, the EC number was used as a primary identifier for reconstructing the metabolic pathway from the complete genome. This was accomplished by drawing metabolic pathway maps with EC number nodes and by annotating enzyme genes in the genome with EC numbers. By 1999 ortholog IDs were introduced to replace EC numbers and to cover both metabolic and non-metabolic pathways. Ortholog IDs became KOs in 2002. Currently EC numbers are treated as attributes of KOs. Reverse links from EC numbers to K numbers (KO identifiers) are shown in the following Brite hierarchy file.

Enzymes

Since EC numbers are given based on the experiments with specific proteins in specific organisms, it is necessary to examine if they can be extended to other proteins in other organisms. This extension is done through KOs. Furthermore, there are cases where multiple EC numbers are mapped to the same KO or a single EC number should be split into multiple sequence similarity groups (which often represent multiple organism groups), such as shown in the following examples.

Enzyme	NAD	NADP	NAD(P)
glucose-6-phosphate dehydrogenase	1.1.1.388	1.1.1.49	1.1.1.363
glucose-6-phosphate dehydrogenase	K19243	K00036
L-arabinose 1-dehydrogenase	1.1.1.46	-	1.1.1.376
L-arabinose 1-dehydrogenase	?	-	K13873 K19660

EC numbers are manually linked to sequence information

The EC numbers in the Enzyme Nomenclature list are given by the IUBMB/IUPAC Biochemical Nomenclature Committee (formerly, Enzyme Commission) based on published experimental data on enzymatic reactions. Unfortunately, however, the Enzyme Nomenclature list does not contain amino acid sequence information of the enzymes used in the experiments. Nevertheless it is a common practice to assign EC numbers in the sequence databases, including KEGG, which needs to be better founded.

Thus, efforts are being made to identify protein sequences used in the orginal experiments based on the refereneces given in the Enzyme Nomenclature list provided by the ExplorEnz database. For example, the following table shows the references and sequence data corresponding to the EC numbers shown above.

EC	KO	PMID	Sequence
1.1.1.388	K19243	25836736	[hvo:HVO_0511]
1.1.1.49	K00036	12215813	[aae:aq_497]
1.1.1.363	K00036	4396688	[ag:AAA25265]
1.1.1.376	K13873	16326697	[ag:BAD95974]
1.1.1.376	K19660	23949136	[hvo:HVO_B0032]

Note that the organism code "ag" represents the new KEGG GENES addendum category, which is an individual protein-based and publication-based collection of functionally characterized proteins. There are many reactions that had been characterized before sequencing data became available, including the EC number "1.1.1.46" shown above.

The following is a list of currently identified sequences.

Sequence data for EC numbers

It contains the following sequence data:

GenBank - author submitted nucleotide sequence data (whenever available)
NCBI Protein - enzyme sequence data in this submission (whenever available)
KEGG GENES - identified enzyme sequence data

which are all manually identified, and

PDB - enzyme 3D structure data

which are computationally generated by matching PubMed IDs.

Reference

Kanehisa, M.; Enzyme annotation and metabolic reconstruction using KEGG. Methods Mol. Biol. 1611, 135-145 (2017). [pubmed]

Last updated: May 1, 2021

KEGG Enzyme

Associating sequence data to Enzyme Nomenclature

Enzyme Nomenclature

EC numbers are not primary identifiers in KEGG

EC numbers are manually linked to sequence information