Comparative genomics is a branch of biological research that examines genome sequences across a spectrum of species, spanning from humans and mice to a diverse array of organisms from bacteria to chimpanzees.[2][3] This large-scale holistic approach compares two or more genomes to discover the similarities and differences between the genomes and to study the biology of the individual genomes.[4] Comparison of whole genome sequences provides a highly detailed view of how organisms are related to each other at the gene level. By comparing whole genome sequences, researchers gain insights into genetic relationships between organisms and study evolutionary changes.[2] The major principle of comparative genomics is that common features of two organisms will often be encoded within the DNA that is evolutionarily conserved between them. Therefore, Comparative genomics provides a powerful tool for studying evolutionary changes among organisms, helping to identify genes that are conserved or common among species, as well as genes that give unique characteristics of each organism. Moreover, these studies can be performed at different levels of the genomes to obtain multiple perspectives about the organisms.[4]
The comparative genomic analysis begins with a simple comparison of the general features of genomes such as genome size, number of genes, and chromosome number. Table 1 presents data on several fully sequenced model organisms, and highlights some striking findings. For instance, while the tiny flowering plant Arabidopsis thaliana has a smaller genome than that of the fruit fly Drosophila melanogaster (157 million base pairs v. 165 million base pairs, respectively) it possesses nearly twice as many genes (25,000 v. 13,000). In fact, A. thaliana has approximately the same number of genes as humans (25,000). Thus, a very early lesson learned in the genomic era is that genome size does not correlate with evolutionary status, nor is the number of genes proportionate to genome size.[5]
Table 1: Comparative genome sizes of humans and other model organisms[2]
In comparative genomics, synteny is the preserved order of genes on chromosomes of related species indicating their descent from a common ancestor. Synteny provides a framework in which the conservation of homologous genes and gene order is identified between genomes of different species.[9] Synteny blocks are more formally defined as regions of chromosomes between genomes that share a common order of homologous genes derived from a common ancestor.[10][11] Alternative names such as conserved synteny or collinearity have been used interchangeably.[12] Comparisons of genome synteny between and within species have provided an opportunity to study evolutionary processes that lead to the diversity of chromosome number and structure in many lineages across the tree of life;[13][14] early discoveries using such approaches include chromosomal conserved regions in nematodes and yeast,[15][16] evolutionary history and phenotypic traits of extremely conserved Hox gene clusters across animals and MADS-box gene family in plants,[17][18] and karyotype evolution in mammals and plants.[19]
Virtually started as soon as the whole genomes of two organisms became available (that is, the genomes of the bacteria Haemophilus influenzae and Mycoplasma genitalium) in 1995, comparative genomics is now a standard component of the analysis of every new genome sequence.[2][20] With the explosion in the number of genome projects due to the advancements in DNA sequencing technologies, particularly the next-generation sequencing methods in late 2000s, this field has become more sophisticated, making it possible to deal with many genomes in a single study.[21] Comparative genomics has revealed high levels of similarity between closely related organisms, such as humans and chimpanzees, and, more surprisingly, similarity between seemingly distantly related organisms, such as humans and the yeast Saccharomyces cerevisiae.[22] It has also showed the extreme diversity of the gene composition in different evolutionary lineages.[20]
^The C. elegans Sequencing Consortium (December 1998). "Genome sequence of the nematode C. elegans: a platform for investigating biology". Science. 282 (5396): 2012–2018. doi:10.1126/science.282.5396.2012. PMID9851916.
^Wong S, Wolfe KH (July 2005). "Birth of a metabolic gene cluster in yeast by adaptive gene relocation". Nature Genetics. 37 (7): 777–782. doi:10.1038/ng1584. PMID15951822.
^Ruelens P, de Maagd RA, Proost S, Theißen G, Geuten K, Kaufmann K (2013). "FLOWERING LOCUS C in monocots and the tandem origin of angiosperm-specific MADS-box genes". Nature Communications. 4: 2280. Bibcode:2013NatCo...4.2280R. doi:10.1038/ncomms3280. PMID23955420.