Pan-genome

Pangenome analysis of Streptococcus agalactiae genomes made with Anvi'o [1] software whose development is led by A. Murat Eren. Genomes obtained from Tettelin et al. (2005).[2] Each circle corresponds to one genome and each radius represent a gene family. At the bottom and at right are localized the core genome families. Some families in the core may have more than one homologous gene per genome. In the middle, at the left of the figure the shell genome is observed. At the top left are shown families from the dispensable genome and singletons.

In the fields of molecular biology and genetics, a pan-genome (pangenome or supragenome) is the entire set of genes from all strains within a clade. More generally, it is the union of all the genomes of a clade.[2][3][4][5] The pan-genome can be broken down into a "core pangenome" that contains genes present in all individuals, a "shell pangenome" that contains genes present in two or more strains, and a "cloud pangenome" that contains genes only found in a single strain.[3][4][6] Some authors also refer to the cloud genome as "accessory genome" containing 'dispensable' genes present in a subset of the strains and strain-specific genes.[2][3][4] Note that the use of the term 'dispensable' has been questioned, at least in plant genomes, as accessory genes play "an important role in genome evolution and in the complex interplay between the genome and the environment".[5] The field of study of pangenomes is called pangenomics.[2]

The genetic repertoire of a bacterial species is much larger than the gene content of an individual strain.[7] Some species have open (or extensive) pangenomes, while others have closed pangenomes.[2] For species with a closed pan-genome, very few genes are added per sequenced genome (after sequencing many strains), and the size of the full pangenome can be theoretically predicted. Species with an open pangenome have enough genes added per additional sequenced genome that predicting the size of the full pangenome is impossible.[4] Population size and niche versatility have been suggested as the most influential factors in determining pan-genome size.[2]

Pangenomes were originally constructed for species of bacteria and archaea, but more recently eukaryotic pan-genomes have been developed, particularly for plant species. Plant studies have shown that pan-genome dynamics are linked to transposable elements.[8][9][10][11] The significance of the pan-genome arises in an evolutionary context, especially with relevance to metagenomics,[12] but is also used in a broader genomics context.[13] An open access book reviewing the pangenome concept and its implications, edited by Tettelin and Medini, was published in the spring of 2020.[14]

  1. ^ Eren AM, Kiefl E, Shaiber A, Veseli I, Miller SE, Schechter MS, et al. (January 2021). "Community-led, integrated, reproducible multi-omics with anvi'o". Nature Microbiology. 6 (1): 3–6. doi:10.1038/s41564-020-00834-3. PMC 8116326. PMID 33349678.
  2. ^ a b c d e f Tettelin H, Masignani V, Cieslewicz MJ, Donati C, Medini D, Ward NL, Angiuoli SV, Crabtree J, Jones AL, Durkin AS, DeBoy RT (2005-09-27). "Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implications for the microbial "pan-genome"". Proceedings of the National Academy of Sciences. 102 (39): 13950–13955. Bibcode:2005PNAS..10213950T. doi:10.1073/pnas.0506758102. ISSN 0027-8424. PMC 1216834. PMID 16172379.
  3. ^ a b c Medini D, Donati C, Tettelin H, Masignani V, Rappuoli R (December 2005). "The microbial pan-genome". Current Opinion in Genetics & Development. 15 (6): 589–94. doi:10.1016/j.gde.2005.09.006. PMID 16185861.
  4. ^ a b c d Vernikos G, Medini D, Riley DR, Tettelin H (February 2015). "Ten years of pan-genome analyses". Current Opinion in Microbiology. 23: 148–54. doi:10.1016/j.mib.2014.11.016. PMID 25483351.
  5. ^ a b Marroni F, Pinosio S, Morgante M (April 2014). "Structural variation and genome complexity: is dispensable really dispensable?". Current Opinion in Plant Biology. 18: 31–36. Bibcode:2014COPB...18...31M. doi:10.1016/j.pbi.2014.01.003. PMID 24548794.
  6. ^ Wolf YI, Makarova KS, Yutin N, Koonin EV (December 2012). "Updated clusters of orthologous genes for Archaea: a complex ancestor of the Archaea and the byways of horizontal gene transfer". Biology Direct. 7: 46. doi:10.1186/1745-6150-7-46. PMC 3534625. PMID 23241446.
  7. ^ Mira A, Martín-Cuadrado AB, D'Auria G, Rodríguez-Valera F (2010). "The bacterial pan-genome:a new paradigm in microbiology". Int Microbiol. 13 (2): 45–57. doi:10.2436/20.1501.01.110. PMID 20890839.
  8. ^ Morgante M, De Paoli E, Radovic S (April 2007). "Transposable elements and the plant pan-genomes". Current Opinion in Plant Biology. 10 (2): 149–55. Bibcode:2007COPB...10..149M. doi:10.1016/j.pbi.2007.02.001. PMID 17300983.
  9. ^ Gordon SP, Contreras-Moreira B, Woods DP, Des Marais DL, Burgess D, Shu S, et al. (December 2017). "Extensive gene content variation in the Brachypodium distachyon pan-genome correlates with population structure". Nature Communications. 8 (1): 2184. Bibcode:2017NatCo...8.2184G. doi:10.1038/s41467-017-02292-8. PMC 5736591. PMID 29259172.
  10. ^ Gordon SP, Contreras-Moreira B, Levy JH, Djamei A, Czedik-Eysenberg A, Tartaglio VS, et al. (July 2020). "Gradual polyploid genome evolution revealed by pan-genomic analysis of Brachypodium hybridum and its diploid progenitors". Nature Communications. 11 (1): 3670. Bibcode:2020NatCo..11.3670G. doi:10.1038/s41467-020-17302-5. PMC 7391716. PMID 32728126.
  11. ^ Contreras-Moreira B, Cantalapiedra CP, García-Pereira MJ, Gordon SP, Vogel JP, Igartua E, et al. (February 2017). "Analysis of Plant Pan-Genomes and Transcriptomes with GET_HOMOLOGUES-EST, a Clustering Solution for Sequences of the Same Species". Frontiers in Plant Science. 8: 184. doi:10.3389/fpls.2017.00184. PMC 5306281. PMID 28261241.
  12. ^ Reno ML, Held NL, Fields CJ, Burke PV, Whitaker RJ (May 2009). "Biogeography of the Sulfolobus islandicus pan-genome". Proceedings of the National Academy of Sciences of the United States of America. 106 (21): 8605–10. Bibcode:2009PNAS..106.8605R. doi:10.1073/pnas.0808945106. PMC 2689034. PMID 19435847.
  13. ^ Reinhardt JA, Baltrus DA, Nishimura MT, Jeck WR, Jones CD, Dangl JL (February 2009). "De novo assembly using low-coverage short read sequence data from the rice pathogen Pseudomonas syringae pv. oryzae". Genome Research. 19 (2): 294–305. doi:10.1101/gr.083311.108. PMC 2652211. PMID 19015323.
  14. ^ Tettelin H, Medini D (2020). Tettelin H, Medini D (eds.). The Pangenome (PDF). doi:10.1007/978-3-030-38281-0. ISBN 978-3-030-38280-3. PMID 32633908. S2CID 217167361.