KEGG icon

KEGG Syntax - Genome Alignment


Genome Alignment

Genome alignment is usually done by aligning nucleotide sequences of two genomes. Here the genome is considered as a sequence of genes identified by KOs (K numbers) and the genome alignment is done by aligning sequences of matching K numbers. Thus, this approach significantly simplifies the problem of gene order alignment.

We have developed a new tool for finding all instances of locally similar gene orders in two genomes above a given threshold using the Goad-Kanehisa algorithm (see below).

More details about this implementation
  1. Gene orders are available for KEGG organisms with the NCBI assembly level of "Complete Genome" or "Chromosome" (see br08611) and all viruses (see br08621).
  2. Genes for CDS, tRNA and rRNA are considered.
  3. Since genes are labeled with K numbers, the gene order is converted to a sequence of K numbers.
  4. The algorithm applies to comparsion of two such gene order sequences with the scoring of
       match: 1, mismatch: -1, gap: -1, neutral: 0
    where neutral means the alignment of genes without K numbers.
  5. Locally similar gene orders with the score of 3 or more are reported.
  6. When genes with the same K numbers are repeated, they are combined into a single unit with the number of repeats in parentheses in the output, enabling the alignment of varying numbers of repeats in two sequences.
  7. When genes on the complementary strand are matched, they are marked with "<" in the output.
  8. Comparison of gene order sequences is made twice in two directions: forward-forward and forward-reverse directions.
  9. The reverse direction is marked with "(r)" in the output.

About Goad-Kanehisa Algorithm

In the early 1980s during the pre-GenBank project of Los Alamos Sequence Library, an algorithm for finding locally similar regions of two sequences was developed by Goad and Kanehisa and reported in Nucleic Acids Res 10:247-263 (1982) [doi]. The essence of this algorithm is to perform pruning of paths by taking a logical product of forward and reverse path matrices, in addition to the pruning associated with the weighting scheme of not allowing negative score values, which is similar to the Smith-Waterman algorithm [doi] as mentioned in their Note added in proof.

For protein and nucleic sequence alignments, the approach taken by Smith and Waterman for finding the best local similarity is sufficient. However, for the gene order alignment of two genomes, in which many gene positions are likely to be split and changed, the Goad-Kanehisa algorithm is better suited for finding a comprehensive set of local similarities.


Last updated: April 12, 2024