Sequence alignment using Longest Common Subsequence algorithm
In molecular biology, DNAs and proteins can be represented as a sequence of alphabets. DNA sequences consist of A, T, G, C representing nucleobases adenine, thymine, guanine and cytosine. Proteins consist of 20 different letters indicating 20 different amino acids.
Comparison of two sequences, known as sequence comparison, either from the same organism or from different organism is an important task in molecular biology. It is helpful in providing solutions to many biological questions, for example:
predicting structure and function of proteins
inferring evolutionary history and relatedness of species
locating common subsequences in genes / proteins to identify common motifs,
as a sub-problem in genome assembly for DNA sequencing