Sequence alignment using Longest Common Subsequence algorithm
In molecular biology, DNAs and proteins can be represented as a sequence of alphabets. DNA sequences consist of A, T, G, C representing nucleobases adenine, thymine, guanine and cytosine. Proteins consist of 20 different letters indicating 20 different amino acids.
Comparison of two sequences, known as sequence comparison, either from the same organism or from different organism is an important task in molecular biology. It is helpful in providing solutions to many biological questions, for example:
predicting structure and function of proteins
inferring evolutionary history and relatedness of species
locating common subsequences in genes / proteins to identify common motifs,
as a sub-problem in genome assembly for DNA sequencing
In this article, we'll talk about a method for constructing evolutionary trees, known as character based evolutionary tree construction. It was initially designed to infer evolutionary relationships based on morphological and physiological characters.
In character based tree construction, we are given a DNA segment for multiple species coming from the same part of the genome (for example, the same gene). Given these DNA sequences, we could like to construct the evolutionary tree, i.e. predict which species are more closely related and have a recent common ancestor, vs species that are not closely related and diverged earlier.
Character based tree construction method is based on Occam’s razor principle which states “when several hypotheses with different degrees of complexity are proposed to explain the same phenomenon, one should choose the simplest hypothesis”. In terms of tree buil...
Genes encode and can be used to synthesize proteins, and this process is known as gene expression. In higher organisms like humans, thousands of genes express together by different amounts depending upon various factors such as the type of cell (nerve cell or heart cell), environment and disease conditions. For example, different types of cancers invoke different gene expression patterns in humans. These different gene expression patterns under different conditions can be studied using Microarray technology.
Microarrays and Gene Expression profiling
Data from a Microarray can be imagined as rectangular matrix or a grid with each cell in the matrix corresponding to a gene expression value under a particular condition. As shown in the figur...
Protein structure prediction using homology modeling
What are proteins?
Proteins are large biomolecules which are responsible for performing most of the functions within an organisms cells, including responding to stimuli, acting as catalysts for other reactions, transporting molecules from one place to another and performing cell signaling. Just like DNA sequences, protein sequences are strings of molecules but unlike DNA sequences, there are 20 different molecules called amino-acids that make up protein sequences.
Every 1D protein sequence string folds into 3D structures. These 3D protein structures are determine how a protein responds to various environments and which other molecules it interacts with, and hence is critical in the ability of the protein to perform its functions. The 3D structure of protein is described by providing the coo...
Evidence from morphological, biochemical, and gene sequence data suggests that all organisms on Earth are genetically related, and the genealogical relationships of living things can be represented by a vast evolutionary tree, the Tree of Life, or the evolutionary tree. An evolutionary tree is a graph where the sequences under study are represented as leaf nodes with internal nodes and branches depicting the evolutionary relationships between the sequences. In majority of the cases, the DNA sequences are gene sequences from different organisms and may represent the actual evolution of the organisms.
Consider 4 gene sequences Human1, Chimpanzee1, Mouse1 and Fish1 from Human, Chimpanzee, Mouse and Fish species, respectively. We will also assume that these are homologous or equivalent genes that convert glucose to energy in their respective species. The hypothetical evolutionary tree of the 4 genes can be seen from the following figure.