Precomputed sequence similarities

KEGG SSDB (Sequence Similarity DataBase) contains the information about amino acid sequence similarities among all protein-coding genes in the complete genomes, as well as the addendum and virus categories, of the GENES database. All possible pairwise genome (and category) comparisons are performed by the SSEARCH program, and the gene pairs with the Smith-Waterman similarity score of 100 or more are entered in SSDB, together with the information about best hits and bidirectional best hits (best-best hits). SSDB is thus a huge weighted, directed graph, which can be used for searching orthologs and paralogs, as well as conserved gene clusters with additional consideration of positional correlations on the chromosome.

The relationship of gene x in genome A and gene y in genome B is defined as follows:
forward best:
reverse best:
x is compared against all genes in genome B and y is found as top-scoring
y is compared against all genes in genome A and x is found as top-scoring
both of these relationships hold
(Note) The option to search reverse best hits is discontinued; "forward best" is now simply called "best".

Orthologs and paralogs

In order to speed up the search, SSDB is organized as a collection of "GFIT tables" containing selected information that is useful for identifying possible orthologs and paralogs. This includes not only the score and the direction of best hits, but also the margin, which is the score difference between the best hit and the second best hit.

red Search orthologs: (enter keggid in the form of org:gene, e.g., syn:sll1452)
with and above
All organisms Selected organism group
red Search paralogs: (enter keggid)

Conserved gene clusters

SSDB is useful to efficiently search a conserved gene cluster containing the query gene. First, the query gene and its best hits are considered as an initial cluster. Second, neighboring genes on both sides of the chromosome are included in the cluster as long as they are also best hits. Third, gapped genes are included if they are best hits in other locations on the chromosome.

red Search conserved gene clusters: (enter keggid)

Precomputed sequence motifs

SSDB also contains precomputed protein domains of Pfam, here called motifs, for all protein coding genes.

red Search motifs: (enter keggid in the form of org:gene, e.g., eco:b0002)
red Search common motifs: (enter multiple keggid's, eg., eco:b0002 eco:b3940 eco:b4024)
red Search sequences with given motifs: (enter one or more motif identifier, e.g., pf:DnaJ)

Search against: All organisms
Selected organism (three-letter code such as hsa)

Last updated: October 1, 2017
Feedback KEGG GenomeNet Kanehisa Laboratories