1887

Abstract

Horizontal gene transfer (HGT) and the resulting patterns of gene gain and loss are a fundamental part of bacterial evolution. Investigating these patterns can help us to understand the role of selection in the evolution of bacterial pangenomes and how bacteria adapt to a new niche. Predicting the presence or absence of genes can be a highly error-prone process that can confound efforts to understand the dynamics of horizontal gene transfer. This review discusses both the challenges in accurately constructing a pangenome and the potential consequences errors can have on downstream analyses. We hope that by summarizing these issues researchers will be able to avoid potential pitfalls, leading to improved bacterial pangenome analyses.

Funding
This study was supported by the:
  • European Research Council (Award 742158)
    • Principle Award Recipient: JukkaCorander
  • Norges Forskningsråd (Award 2999131)
    • Principle Award Recipient: GerryTonkin-Hill
  • This is an open-access article distributed under the terms of the Creative Commons Attribution License. This article was made open access via a Publish and Read agreement between the Microbiology Society and the corresponding author’s institution.
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.001021
2023-05-25
2024-04-29
Loading full text...

Full text loading...

/deliver/fulltext/mgen/9/5/mgen001021.html?itemId=/content/journal/mgen/10.1099/mgen.0.001021&mimeType=html&fmt=ahah

References

  1. Tettelin H, Masignani V, Cieslewicz MJ, Donati C, Medini D et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial “pan-genome.”. Proc Natl Acad Sci 2005; 102:13950–13955 [View Article] [PubMed]
    [Google Scholar]
  2. Welch RA, Burland V, Plunkett G, Redford P, Roesch P et al. Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proc Natl Acad Sci 2002; 99:17020–17024 [View Article] [PubMed]
    [Google Scholar]
  3. Thomas CM, Nielsen KM. Mechanisms of, and barriers to, horizontal gene transfer between bacteria. Nat Rev Microbiol 2005; 3:711–721 [View Article] [PubMed]
    [Google Scholar]
  4. Corander J, Fraser C, Gutmann MU, Arnold B, Hanage WP et al. Frequency-dependent selection in vaccine-associated pneumococcal population dynamics. Nat Ecol Evol 2017; 1:1950–1960 [View Article] [PubMed]
    [Google Scholar]
  5. Donati C, Hiller NL, Tettelin H, Muzzi A, Croucher NJ et al. Structure and dynamics of the pan-genome of Streptococcus pneumoniae and closely related species. Genome Biol 2010; 11:R107 [View Article] [PubMed]
    [Google Scholar]
  6. Luo C, Walk ST, Gordon DM, Feldgarden M, Tiedje JM et al. Genome sequencing of environmental Escherichia coli expands understanding of the ecology and speciation of the model bacterial species. Proc Natl Acad Sci 2011; 108:7200–7205 [View Article] [PubMed]
    [Google Scholar]
  7. Tonkin-Hill G, MacAlasdair N, Ruis C, Weimann A, Horesh G et al. Producing polished prokaryotic pangenomes with the panaroo pipeline. Genome Biol 2020; 21:180 [View Article] [PubMed]
    [Google Scholar]
  8. Salzberg SL. Next-generation genome annotation: we still struggle to get it right. Genome Biol 2019; 20:92 [View Article] [PubMed]
    [Google Scholar]
  9. Dimonaco NJ, Aubrey W, Kenobi K, Clare A, Creevey CJ. No one tool to rule them all: prokaryotic gene prediction tool annotations are highly dependent on the organism of study. Bioinformatics 2022; 38:1198–1207 [View Article] [PubMed]
    [Google Scholar]
  10. Warren AS, Archuleta J, Feng W-C, Setubal JC. Missing genes in the annotation of prokaryotic genomes. BMC Bioinformatics 2010; 11:131 [View Article] [PubMed]
    [Google Scholar]
  11. Dearlove BL, Xiang F, Frost SDW. Biased phylodynamic inferences from analysing clusters of viral sequences. Virus Evol 2017; 3:vex020 [View Article] [PubMed]
    [Google Scholar]
  12. Lees JA, Vehkala M, Välimäki N, Harris SR, Chewapreecha C et al. Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes. Nat Commun 2016; 7:12797 [View Article] [PubMed]
    [Google Scholar]
  13. Earle SG, Wu C-H, Charlesworth J, Stoesser N, Gordon NC et al. Identifying lineage effects when controlling for population structure improves power in bacterial association studies. Nat Microbiol 2016; 1:16041 [View Article] [PubMed]
    [Google Scholar]
  14. Garrison E, Sirén J, Novak AM, Hickey G, Eizenga JM et al. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat Biotechnol 2018; 36:875–879 [View Article] [PubMed]
    [Google Scholar]
  15. Colquhoun RM, Hall MB, Lima L, Roberts LW, Malone KM et al. Pandora: nucleotide-resolution bacterial pan-genomics with reference graphs. Genome Biol 2021; 22:267 [View Article] [PubMed]
    [Google Scholar]
  16. Denton JF, Lugo-Martinez J, Tucker AE, Schrider DR, Warren WC et al. Extensive error in the number of genes inferred from draft genome assemblies. PLoS Comput Biol 2014; 10:e1003998 [View Article] [PubMed]
    [Google Scholar]
  17. Dong Y, Li C, Kim K, Cui L, Liu X. Genome annotation of disease-causing microorganisms. Brief Bioinform 2021; 22:845–854 [View Article] [PubMed]
    [Google Scholar]
  18. Delcher AL, Bratke KA, Powers EC, Salzberg SL. Identifying bacterial genes and endosymbiont DNA with glimmer. Bioinformatics 2007; 23:673–679 [View Article] [PubMed]
    [Google Scholar]
  19. Hyatt D, Chen G-L, Locascio PF, Land ML, Larimer FW et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 2010; 11:119 [View Article] [PubMed]
    [Google Scholar]
  20. Salzberg SL, Delcher AL, Kasif S, White O. Microbial gene identification using interpolated Markov models. Nucleic Acids Res 1998; 26:544–548 [View Article] [PubMed]
    [Google Scholar]
  21. Besemer J, Lomsadze A, Borodovsky M. GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res 2001; 29:2607–2618 [View Article] [PubMed]
    [Google Scholar]
  22. Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 2014; 30:2068–2069 [View Article] [PubMed]
    [Google Scholar]
  23. Tanizawa Y, Fujisawa T, Nakamura Y. DFAST: a flexible prokaryotic genome annotation pipeline for faster genome publication. Bioinformatics 2018; 34:1037–1039 [View Article] [PubMed]
    [Google Scholar]
  24. Tatusova T, DiCuccio M, Badretdin A, Chetvernin V, Nawrocki EP et al. NCBI prokaryotic genome annotation pipeline. Nucleic Acids Res 2016; 44:6614–6624 [View Article] [PubMed]
    [Google Scholar]
  25. Sommer MJ, Salzberg SL. Balrog: A universal protein model for prokaryotic gene prediction. PLoS Comput Biol 2021; 17:e1008727 [View Article] [PubMed]
    [Google Scholar]
  26. Schwengers O, Jelonek L, Dieckmann MA, Beyvers S, Blom J et al. Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification. Microb Genom 2021; 7:000685 [View Article] [PubMed]
    [Google Scholar]
  27. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997; 25:3389–3402 [View Article] [PubMed]
    [Google Scholar]
  28. Steinegger M, Söding J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol 2017; 35:1026–1028 [View Article] [PubMed]
    [Google Scholar]
  29. Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 2006; 22:1658–1659 [View Article] [PubMed]
    [Google Scholar]
  30. Bayliss SC, Thorpe HA, Coyle NM, Sheppard SK, Feil EJ. PIRATE: A fast and scalable pangenomics toolbox for clustering diverged orthologues in bacteria. Gigascience 2019; 8:giz119 [View Article] [PubMed]
    [Google Scholar]
  31. Li L, Stoeckert CJ, Roos DS. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 2003; 13:2178–2189 [View Article] [PubMed]
    [Google Scholar]
  32. O’Brien KP, Remm M, Sonnhammer ELL. Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res 2005; 33:D476–80 [View Article] [PubMed]
    [Google Scholar]
  33. Enright AJ, Van Dongen S, Ouzounis CA. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 2002; 30:1575–1584 [View Article] [PubMed]
    [Google Scholar]
  34. Tatusov RL, Koonin EV, Lipman DJ. A genomic perspective on protein families. Science 1997; 278:631–637 [View Article] [PubMed]
    [Google Scholar]
  35. Page AJ, Cummins CA, Hunt M, Wong VK, Reuter S et al. Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics 2015; 31:3691–3693 [View Article] [PubMed]
    [Google Scholar]
  36. Ding W, Baumdicker F, Neher RA. panX: pan-genome analysis and exploration. Nucleic Acids Res 2018; 46:e5 [View Article] [PubMed]
    [Google Scholar]
  37. Zhou Z, Charlesworth J, Achtman M. Accurate reconstruction of bacterial pan- and core genomes with PEPPAN. Genome Res 2020; 30:1667–1679 [View Article] [PubMed]
    [Google Scholar]
  38. Gautreau G, Bazin A, Gachet M, Planel R, Burlot L et al. PPanGGOLiN: Depicting microbial diversity via a partitioned pangenome graph. PLoS Comput Biol 2020; 16:e1007732 [View Article] [PubMed]
    [Google Scholar]
  39. Molina N, van Nimwegen E. Universal patterns of purifying selection at noncoding positions in bacteria. Genome Res 2008; 18:148–160 [View Article] [PubMed]
    [Google Scholar]
  40. Khademi SMH, Sazinas P, Jelsbak L. Within-host adaptation mediated by intergenic evolution in Pseudomonas aeruginosa. Genome Biol Evol 2019; 11:1385–1397 [View Article] [PubMed]
    [Google Scholar]
  41. Laslett D, Canback B. ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. Nucleic Acids Res 2004; 32:11–16 [View Article] [PubMed]
    [Google Scholar]
  42. Nawrocki EP, Eddy SR. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 2013; 29:2933–2935 [View Article] [PubMed]
    [Google Scholar]
  43. Edgar RC. PILER-CR: fast and accurate identification of CRISPR repeats. BMC Bioinformatics 2007; 8:18 [View Article] [PubMed]
    [Google Scholar]
  44. Thorpe HA, Bayliss SC, Sheppard SK, Feil EJ. Piggy: a rapid, large-scale pan-genome analysis tool for intergenic regions in bacteria. Gigascience 2018; 7:1–11 [View Article] [PubMed]
    [Google Scholar]
  45. Horesh G, Taylor-Brown A, McGimpsey S, Lassalle F, Corander J et al. Different evolutionary trends form the twilight zone of the bacterial pan-genome. Microb Genom 2021; 7:000670 [View Article] [PubMed]
    [Google Scholar]
  46. Rosconi F, Rudmann E, Li J, Surujon D, Anthony J et al. A bacterial pan-genome makes gene essentiality strain-dependent and evolvable. Nat Microbiol 2022; 7:1580–1592 [View Article] [PubMed]
    [Google Scholar]
  47. Arnold BJ, Huang I-T, Hanage WP. Horizontal gene transfer and adaptive evolution in bacteria. Nat Rev Microbiol 2022; 20:206–218 [View Article] [PubMed]
    [Google Scholar]
  48. Shaw G, Nodder FP. The Duck-Billed Platypus, Platypus anatinus. The Naturalist’s Miscellany 1789; 10385–386 [View Article]
    [Google Scholar]
  49. Tettelin H, Riley D, Cattuto C, Medini D. Comparative genomics: the bacterial pan-genome. Curr Opin Microbiol 2008; 11:472–477 [View Article] [PubMed]
    [Google Scholar]
  50. Zamani-Dahaj SA, Okasha M, Kosakowski J, Higgs PG. Estimating the frequency of horizontal gene transfer using phylogenetic models of gene gain and loss. Mol Biol Evol 2016; 33:1843–1857 [View Article] [PubMed]
    [Google Scholar]
  51. Collins RE, Higgs PG. Testing the infinitely many genes model for the evolution of the bacterial core genome and pangenome. Mol Biol Evol 2012; 29:3413–3425 [View Article] [PubMed]
    [Google Scholar]
  52. Baumdicker F, Hess WR, Pfaffelhuber P. The infinitely many genes model for the distributed genome of bacteria. Genome Biol Evol 2012; 4:443–456 [View Article] [PubMed]
    [Google Scholar]
  53. Baumdicker F, Pfaffelhuber P. The infinitely many genes model with horizontal gene transfer. Electron J Probab 2014; 19: [View Article]
    [Google Scholar]
  54. Hahn MW, De Bie T, Stajich JE, Nguyen C, Cristianini N. Estimating the tempo and mode of gene family evolution from comparative genomic data. Genome Res 2005; 15:1153–1160 [View Article] [PubMed]
    [Google Scholar]
  55. Han MV, Thomas GWC, Lugo-Martinez J, Hahn MW. Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3. Mol Biol Evol 2013; 30:1987–1997 [View Article] [PubMed]
    [Google Scholar]
  56. Mendes FK, Vanderpool D, Fulton B, Hahn MW. CAFE 5 models variation in evolutionary rates among gene families. Bioinformatics 2020btaa1022 [View Article] [PubMed]
    [Google Scholar]
  57. Tonkin-Hill G, Gladstone RA, Pöntinen AK, Arredondo-Alonso S, Bentley SD et al. Robust analysis of prokaryotic pangenome gene gain and loss rates with panstripe. Genome Res 2023; 33:129–140 [View Article] [PubMed]
    [Google Scholar]
  58. Douglas GM, Shapiro BJ. Genic selection within Prokaryotic pangenomes. Genome Biol Evol 2021; 13:evab234 [View Article] [PubMed]
    [Google Scholar]
  59. McInerney JO, McNally A, O’Connell MJ. Why prokaryotes have pangenomes. Nat Microbiol 2017; 2:17040 [View Article] [PubMed]
    [Google Scholar]
  60. Andreani NA, Hesse E, Vos M. Prokaryote genome fluidity is dependent on effective population size. ISME J 2017; 11:1719–1721 [View Article] [PubMed]
    [Google Scholar]
  61. Shapiro BJ. The population genetics of pangenomes. Nat Microbiol 2017; 2:1574 [View Article] [PubMed]
    [Google Scholar]
  62. Vos M, Eyre-Walker A. Are pangenomes adaptive or not?. Nat Microbiol 2017; 2:1576 [View Article] [PubMed]
    [Google Scholar]
  63. McInerney JO, McNally A, O’Connell MJ. Reply to “The population genetics of pangenomes.”. Nat Microbiol 2017; 2:1575 [View Article] [PubMed]
    [Google Scholar]
  64. Whelan FJ, Rusilowicz M, McInerney JO. Coinfinder: detecting significant associations and dissociations in pangenomes. Microb Genom 2020; 6:e000338 [View Article] [PubMed]
    [Google Scholar]
  65. Jumper J, Evans R, Pritzel A, Green T, Figurnov M et al. Highly accurate protein structure prediction with alphafold. Nature 2021; 596:583–589 [View Article] [PubMed]
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.001021
Loading
/content/journal/mgen/10.1099/mgen.0.001021
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error