KEGG icon

KEGG SSDB Database

Orthologs/paralogs, conserved gene clusters, and sequence motifs

KEGG2 PATHWAY BRITE MODULE KO GENOME GENES SSDB Organisms
Enter org:gene      (Example) syn:ssr3451

Precomputed sequence similarities

KEGG SSDB (Sequence Similarity DataBase) contains the information about amino acid sequence similarities among all protein-coding genes in the complete genomes, which is computationally generated from the GENES database in KEGG. All possible pairwise genome comparisons are performed by the SSEARCH program, and the gene pairs with the Smith-Waterman similarity score of 100 or more are entered in SSDB, together with the information about best hits and bidirectional best hits (best-best hits). SSDB is thus a huge weighted, directed graph, which can be used for searching orthologs and paralogs, as well as conserved gene clusters with additional consideration of positional correlations on the chromosome.

The relationship of gene x in genome A and gene y in genome B is defined as follows:
forward best:
reverse best:
best-best:
x is compared against all genes in genome B and y is found as top-scoring
y is compared against all genes in genome A and x is found as top-scoring
both of these relationships hold
(Note) The option to search reverse best hits is discontinued; "forward best" is now simply called "best".

Orthologs and paralogs

In order to speed up the search, SSDB is organized as a collection of "GFIT tables" containing selected information that is useful for identifying possible orthologs and paralogs. This includes not only the score and the direction of best hits, but also the margin, which is the score difference between the best hit and the second best hit.

red Search orthologs: (enter keggid in the form of org:gene, e.g., syn:sll1452)
with and above
All organisms Selected organism group
red Search paralogs: (enter keggid)
above

Conserved gene clusters

SSDB is useful to efficiently search a conserved gene cluster containing the query gene. First, the query gene and its best-best hits are considered as an initial cluster. Second, neighboring genes on both sides of the chromosome are included in the cluster as long as they are also best-best hits. Third, gapped genes are included in the cluster if they are forward best hits.

red Search conserved gene clusters: (enter keggid)
with

Precomputed sequence motifs

SSDB also contains precomputed protein domains of Pfam, here called motifs, for all protein coding genes.

red Search motifs: (enter keggid in the form of org:gene, e.g., eco:b0002)
red Search common motifs: (enter multiple keggid's, eg., eco:b0002 eco:b3940 eco:b4024)
red Search sequences with given motifs: (enter one or more motif identifier, e.g., pf:DnaJ)

Search against: All organisms
Selected organism (three-letter code such as hsa)

Last updated: July 4, 2012
Feedback KEGG GenomeNet Kanehisa Laboratories