1887

Abstract

The pan-genome is defined as the combined set of all genes in the gene pool of a species. Pan-genome analyses have been very useful in helping to understand different evolutionary dynamics of bacterial species: an open pan-genome often indicates a free-living lifestyle with metabolic versatility, while closed pan-genomes are linked to host-restricted, ecologically specialized bacteria. A detailed understanding of the species pan-genome has also been instrumental in tracking the phylodynamics of emerging drug resistance mechanisms and drug-resistant pathogens. However, current approaches to analyse a species’ pan-genome do not take the species population structure into account, nor do they account for the uneven sampling of different lineages, as is commonplace due to over-sampling of clinically relevant representatives. Here we present the application of a population structure-aware approach for classifying genes in a pan-genome based on within-species distribution. We demonstrate our approach on a collection of 7500 genomes, one of the most-studied bacterial species and used as a model for an open pan-genome. We reveal clearly distinct groups of genes, clustered by different underlying evolutionary dynamics, and provide a more biologically informed and accurate description of the species’ pan-genome.

Funding
This study was supported by the:
  • wellcome trust (Award 217303/Z/19/Z)
    • Principle Award Recipient: EvaHeinz
  • woolf fisher scholarship
    • Principle Award Recipient: StephanieMcGimpsey
  • european research council (Award 742158)
    • Principle Award Recipient: JukkaCorander
  • wellcome sanger institute (Award 206194)
    • Principle Award Recipient: NotApplicable
  • This is an open-access article distributed under the terms of the Creative Commons Attribution License. This article was made open access via a Publish and Read agreement between the Microbiology Society and the corresponding author’s institution.
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.000670
2021-09-24
2021-10-24
Loading full text...

Full text loading...

/deliver/fulltext/mgen/7/9/mgen000670.html?itemId=/content/journal/mgen/10.1099/mgen.0.000670&mimeType=html&fmt=ahah

References

  1. Brockhurst MA, Harrison E, Hall JPJ, Richards T, McNally A et al. The ecology and evolution of pangenomes. Curr Biol 2019; 29:R1094–R1103S0960-9822(19)31028-0 [View Article]
    [Google Scholar]
  2. Lassalle F, Planel R, Penel S, Chapulliot D, Barbe V et al. Ancestral genome estimation reveals the history of ecological diversification in agrobacterium. Genome Biol Evol 2017; 9:3413–3431 [View Article] [PubMed]
    [Google Scholar]
  3. Vos M, Eyre-Walker A. Are pangenomes adaptive or not?. Nat Microbiol 2017; 2:1576 [View Article] [PubMed]
    [Google Scholar]
  4. Andreani NA, Hesse E, Vos M. Prokaryote genome fluidity is dependent on effective population size. ISME J 2017111719–1721 [View Article] [PubMed]
    [Google Scholar]
  5. Shapiro BJ. The population genetics of pangenomes. Nat Microbiol 2017; 2:1574 [View Article] [PubMed]
    [Google Scholar]
  6. McInerney JO, McNally A, O’Connell MJ. Why prokaryotes have pangenomes. Nat Microbiol 2017; 2:17040 [View Article]
    [Google Scholar]
  7. Touchon M, Hoede C, Tenaillon O, Barbe V, Baeriswyl S et al. Organised genome dynamics in the Escherichia coli species results in highly diverse adaptive paths. PLoS Genet 2009; 5:e1000344 [View Article] [PubMed]
    [Google Scholar]
  8. Rasko DA, Rosovitz MJ, Myers GSA, Mongodin EF, Fricke WF et al. The pangenome structure of Escherichia coli: Comparative genomic analysis of E. coli commensal and pathogenic isolates. J Bacteriol 2008; 190:6881–6893 [View Article] [PubMed]
    [Google Scholar]
  9. Gordienko EN, Kazanov MD, Gelfand MS. Evolution of pan-genomes of Escherichia coli, Shigella spp., and Salmonella enterica. J Bacteriol 2013; 195:2786–2792 [View Article] [PubMed]
    [Google Scholar]
  10. Morel B, Kozlov AM, Stamatakis A, Szöllősi GJ. GeneRax: A tool for species tree-aware maximum likelihood based gene family tree inference under gene duplication, transfer, and loss. Mol Biol Evol 2020; 37:2763–2774 [View Article] [PubMed]
    [Google Scholar]
  11. Horesh G, Blackwell GA, Tonkin-Hill G, Corander J, Heinz E et al. A comprehensive and high-quality collection of Escherichia coli genomes and their genes. Microb Genom 2021; 7:000499 [View Article]
    [Google Scholar]
  12. Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 2013; 30:772–780 [View Article] [PubMed]
    [Google Scholar]
  13. Nguyen L-T, Schmidt HA, von Haeseler A, Minh BQ. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol 2015; 32:268–274 [View Article] [PubMed]
    [Google Scholar]
  14. Paradis E, Claude J, Strimmer K. APE: Analyses of Phylogenetics and Evolution in R language. Bioinformatics 2004; 20:289–290 [View Article] [PubMed]
    [Google Scholar]
  15. Farris JS. Methods for Computing Wagner Trees. Syst Biol 1970; 19:83–92 [View Article]
    [Google Scholar]
  16. Yu G, Smith DK, Zhu H, Guan Y, Lam TT. ggtree: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol Evol 2017; 8:28–36 [View Article]
    [Google Scholar]
  17. Huerta-Cepas J, Forslund K, Coelho LP, Szklarczyk D, Jensen LJ et al. Fast genome-wide functional annotation through orthology assignment by eggNOG-Mapper. Mol Biol Evol 2017; 34:2115–2122 [View Article] [PubMed]
    [Google Scholar]
  18. Galperin MY, Makarova KS, Wolf YI, Koonin EV. Expanded microbial genome coverage and improved protein family annotation in the COG database. Nucleic Acids Res 2015; 43:D261–9 [View Article] [PubMed]
    [Google Scholar]
  19. Wickham H. ggplot2: Elegant Graphics for Data Analysis Springer; 2016 p 260
    [Google Scholar]
  20. Pettengill EA, Pettengill JB, Binet R. Phylogenetic analyses of shigella and enteroinvasive Escherichia coli for the identification of molecular epidemiological markers: Whole-genome comparative analysis does not support distinct genera designation. Front Microbiol 2015; 6:1573 [View Article] [PubMed]
    [Google Scholar]
  21. Chattaway MA, Schaefer U, Tewolde R, Dallman TJ, Jenkins C. Identification of Escherichia coli and Shigella species from whole-genome sequences. J Clin Microbiol 2017; 55:616–623 [View Article] [PubMed]
    [Google Scholar]
  22. Lees JA, Harris SR, Tonkin-Hill G, Gladstone RA, Lo SW et al. Fast and flexible bacterial genomic epidemiology with POPPUNK. Genome Res 2019; 29:304–316304316L [View Article] [PubMed]
    [Google Scholar]
  23. Bortolaia V, Larsen J, Damborg P, Guardabassi L. Potential pathogenicity and host range of extended-spectrum beta-lactamase-producing Escherichia coli isolates from healthy poultry. Appl Environ Microbiol 2011; 77:5830–5833 [View Article] [PubMed]
    [Google Scholar]
  24. Oteo J, Diestra K, Juan C, Bautista V, Novais A et al. Extended-spectrum beta-lactamase-producing Escherichia coli in Spain belong to a large variety of multilocus sequence typing types, including ST10 complex/A, ST23 complex/A and ST131/B2. Int J Antimicrob Agents 2009; 34:173–176 [View Article] [PubMed]
    [Google Scholar]
  25. Matamoros S, van Hattem JM, Arcilla MS, Willemse N, Melles DC et al. Global phylogenetic analysis of Escherichia coli and plasmids carrying the mcr-1 gene indicates bacterial diversity but plasmid restriction. Sci Rep 2017; 7:15364 [View Article] [PubMed]
    [Google Scholar]
  26. Lassalle F, Muller D, Nesme X. Ecological speciation in bacteria: Reverse ecology approaches reveal the adaptive part of bacterial cladogenesis. Res Microbiol 2015; 166:729–741 [View Article] [PubMed]
    [Google Scholar]
  27. Gori A, Harrison OB, Mlia E, Nishihara Y, Chan JM et al. Pan-GWAS of Streptococcus agalactiae highlights lineage-specific genes associated with virulence and niche adaptation. mBio 2020; 11:11e00728–20 [View Article]
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.000670
Loading
/content/journal/mgen/10.1099/mgen.0.000670
Loading

Data & Media loading...

Supplements

Supplementary material 1

EXCEL

Supplementary material 2

PDF

Most cited this month Most Cited RSS feed

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error