Evidence from morphological, biochemical, and gene sequence data suggests that all organisms on Earth are genetically related, and the genealogical relationships of living things can be represented by a vast evolutionary tree, the Tree of Life, or the evolutionary tree. An evolutionary tree is a graph where the sequences under study are represented as leaf nodes with internal nodes and branches depicting the evolutionary relationships between the sequences. In majority of the cases, the DNA sequences are gene sequences from different organisms and may represent the actual evolution of the organisms.
Consider 4 gene sequences Human1, Chimpanzee1, Mouse1 and Fish1 from Human, Chimpanzee, Mouse and Fish species, respectively. We will also assume that these are homologous or equivalent genes that convert glucose to energy in their respective species. The hypothetical evolutionary tree of the 4 genes can be seen from the following figure.
This tree shows the how the present day or extant genes from the four species have evolved from each other. The tree shows that there was a common ancestral gene (the root of the tree) which split or evolved into 2 different genes; one was the present day Fish1 gene and other was the common ancestral gene of the Mouse, Chimpanzee and Human. Then, the common ancestral gene of Mouse, Chimpanzee and Human evolved into the present day Mouse1 gene and the common ancestral gene of Human1 and Chimpanzee1. Finally, the common ancestral gene of Human1 and Chimpanzee1 evolved into the present-day Human1 and Chimpanzee1 gene.
The branch lengths show the relative evolution of the 4 genes with respect to each other. For example, the Human1 sequence evolved twice as much as the Chimpanzee sequence after the split from the common ancestral sequence. The evolutionary distances between the gene sequences are the sum of branch lengths traversed from one sequence to another. For example, evolutionary distance between the Fish1 and Human1 gene = distance(Fish1, Human1) = 3+1+1+2=7
This evolutionary tree shows the evolutionary relationships among the genes only, and may or may not represent the evolutionary relationships among the species which contain those genes. If these genes do represent the evolutionary relationships of the 4 species, then we say that Chimpanzees are the closest relatives to humans and mice are relatively closer to humans than fish because fish diverged from humans before mice did.
The tree shown above is called a rooted tree because the placement of the common ancestor of all the genes is exactly known (between Fish1 and the rest of the sequences). There is another version of the tree called the unrooted tree which is show below.
The unrooted trees only show relative relationships of the genes and does not have an exact point of origin or the placement of oldest common ancestor.
Labeled vs unlabeled trees: Labeled tree has specific value assigned to each leaf an unlabeled tree does not.
Scaled vs unscaled trees: Scaled tree has edge length drawn in proportion to a specific unit e.g. evolutionary time. Unscaled does not.
Bifurcating vs multifurcating trees: A bifurcating tree has exactly two children for each internal node. Multifurcating has more than two.
A rooted labeled scaled bifurcating evolutionary tree
Molecular clock hypothesis assumes that the rate of evolution is constant in all the independent lineages and the branch lengths for all the present-day sequences in the tree is directly proportional to how far back in time the sequences diverged. Let us see this with the same previous example involving the 4 genes.
According to the rooted tree, the Fish1 sequence was earliest to separate or diverge from the other three, say 3 million years ago. That means the Fish1 sequence has been evolving independently for 3 million years. Then, a million years after Fish1 divergence, Mouse1 sequence was formed (say 2 million years ago) and again a million years after the Mouse1 formation, Human1 and Chimpanzee1 genes evolved, and they have been evolving for a million years.
According to the molecular clock hypothesis, since the Fish1 gene has been evolving for longest time it should have the longest branch length in the tree and Human1 and Mouse1 should have equal and shortest branches because they have been evolving for the shortest time. It is as if each gene starts a molecular clock when it is formed and all the clocks tick at the same rate, therefore their branch length corresponds to the amount of time these clocks are ticking. The molecular clock assumption produces an evolutionary tree called the ultrametric tree.
The two important properties of Ultrametric trees are:
- The branch lengths for most recent diverging sequences are equal. The Human1 and Chimpanzee1 genes have equal branch length since they have been evolving for a million years
- The path lengths from the root (root node) to all the genes (leaf nodes) are equal since the maximum total period of evolution is equal for all the genes is equal. In case of our example, all the four genes have been evolving for 3 million years.
In this tutorial we introduced the concept of evolutionary trees and its various types. In the upcoming tutorials, we'll consider specific algorithms to construct evolutionary trees from gene sequencing data.