1887

Abstract

The genomic diversity of microbes is commonly parameterized as SNPs relative to a reference genome of a well-characterized, but arbitrary, isolate. However, any reference genome contains only a fraction of the microbial , the set of genes observed in a given species. Reference-based approaches are thus blind to the dynamics of the accessory genome, as well as variation within gene order and copy number. With the widespread usage of long-read sequencing, the number of high-quality, complete genome assemblies has increased dramatically. In addition to pangenomic approaches that focus on the variation in the sets of genes present in different genomes, complete assemblies allow investigations of the evolution of genome structure and gene order. This latter problem, however, is computationally demanding with few tools available that shed light on these dynamics. Here, we present , a Julia-based library and command line interface for aligning whole genomes into a graph. Each genome is represented as a path along vertices, which in turn encapsulate homologous multiple sequence alignments. The resultant data structure succinctly summarizes population-level nucleotide and structural polymorphisms and can be exported into several common formats for either downstream analysis or immediate visualization.

  • This is an open-access article distributed under the terms of the Creative Commons Attribution License.
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.001034
2023-06-06
2024-04-24
Loading full text...

Full text loading...

/deliver/fulltext/mgen/9/6/mgen001034.html?itemId=/content/journal/mgen/10.1099/mgen.0.001034&mimeType=html&fmt=ahah

References

  1. Arnold BJ, Huang I-T, Hanage WP. Horizontal gene transfer and adaptive evolution in bacteria. Nat Rev Microbiol 2022; 20:206–218 [View Article] [PubMed]
    [Google Scholar]
  2. Sakoparnig T, Field C, van Nimwegen E. Whole genome phylogenies reflect the distributions of recombination rates for many bacterial species. Elife 2021; 10:e65366 [View Article] [PubMed]
    [Google Scholar]
  3. Touchon M, Perrin A, de Sousa JAM, Vangchhia B, Burn S et al. Phylogenetic background and habitat drive the genetic diversification of Escherichia coli. PLoS Genet 2020; 16:e1008866 [View Article] [PubMed]
    [Google Scholar]
  4. Touchon M, Hoede C, Tenaillon O, Barbe V, Baeriswyl S et al. Organised genome dynamics in the Escherichia coli species results in highly diverse adaptive paths. PLoS Genet 2009; 5:e1000344 [View Article] [PubMed]
    [Google Scholar]
  5. Doolittle WF, Zhaxybayeva O. On the origin of prokaryotic species. Genome Res 2009; 19:744–756 [View Article] [PubMed]
    [Google Scholar]
  6. Whibley A, Kelley JL, Narum SR. The changing face of genome assemblies: guidance on achieving high-quality reference genomes. Mol Ecol Resour 2021; 21:641–652 [View Article] [PubMed]
    [Google Scholar]
  7. Tettelin H, Riley D, Cattuto C, Medini D. Comparative genomics: the bacterial pan-genome. Curr Opin Microbiol 2008; 11:472–477 [View Article] [PubMed]
    [Google Scholar]
  8. Page AJ, Cummins CA, Hunt M, Wong VK, Reuter S et al. Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics 2015; 31:3691–3693 [View Article] [PubMed]
    [Google Scholar]
  9. Ding W, Baumdicker F, Neher RA. panX: pan-genome analysis and exploration. Nucleic Acids Res 2018; 46:e5–5 [View Article] [PubMed]
    [Google Scholar]
  10. Eizenga JM, Novak AM, Sibbesen JA, Heumos S, Ghaffaari A et al. Pangenome graphs. Annu Rev Genomics Hum Genet 2020; 21:139–162 [View Article] [PubMed]
    [Google Scholar]
  11. Iqbal Z, Caccamo M, Turner I, Flicek P, McVean G. De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat Genet 2012; 44:226–232 [View Article] [PubMed]
    [Google Scholar]
  12. Muggli MD, Bowe A, Noyes NR, Morley PS, Belk KE et al. Succinct colored de Bruijn graphs. Bioinformatics 2017; 33:3181–3187 [View Article] [PubMed]
    [Google Scholar]
  13. Holley G, Melsted P. Bifrost: highly parallel construction and indexing of colored and compacted de Bruijn graphs. Genome Biol 2020; 21:249 [View Article] [PubMed]
    [Google Scholar]
  14. Schulz T, Wittler R, Stoye J. Sequence-based pangenomic core detection. iScience 2022; 25:104413 [View Article] [PubMed]
    [Google Scholar]
  15. Horsfield ST, Croucher NJ, Lees JA. Accurate and fast graph-based pangenome annotation and clustering with ggCaller. bioRxiv 202301–35 [View Article]
    [Google Scholar]
  16. Angiuoli SV, Salzberg SL. Mugsy: fast multiple alignment of closely related whole genomes. Bioinformatics 2011; 27:334–342 [View Article]
    [Google Scholar]
  17. Darling AE, Mau B, Perna NT, Stajich JE. progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 2010; 5:e11147 [View Article]
    [Google Scholar]
  18. Angiuoli SV, Dunning Hotopp JC, Salzberg SL, Tettelin H. Improving pan-genome annotation using whole genome multiple alignment. BMC Bioinformatics 2011; 12:1–11 [View Article]
    [Google Scholar]
  19. Chan AP, Sutton G, DePew J, Krishnakumar R, Choi Y et al. A novel method of consensus pan-chromosome assembly and large-scale comparative analysis reveal the highly flexible pan-genome of Acinetobacter baumannii. Genome Biol 2015; 16:143 [View Article] [PubMed]
    [Google Scholar]
  20. Oliveira PH, Touchon M, Cury J, Rocha EPC. The chromosomal organization of horizontal gene transfer in bacteria. Nat Commun 2017; 8:841 [View Article] [PubMed]
    [Google Scholar]
  21. Sutton G, Fogel GB, Abramson B, Brinkac L, Michael T et al. A pan-genome method to determine core regions of the Bacillus subtilis and Escherichia coli genomes. F1000Res 2021; 10:286 [View Article] [PubMed]
    [Google Scholar]
  22. Gautreau G, Bazin A, Gachet M, Planel R, Burlot L et al. PPanGGOLiN: depicting microbial diversity via a partitioned pangenome graph. PLoS Comput Biol 2020; 16:e1007732 [View Article] [PubMed]
    [Google Scholar]
  23. Colquhoun RM, Hall MB, Lima L, Roberts LW, Malone KM et al. Pandora: nucleotide-resolution bacterial pan-genomics with reference graphs. Genome Biol 2021; 22:267 [View Article] [PubMed]
    [Google Scholar]
  24. Tonkin-Hill G, MacAlasdair N, Ruis C, Weimann A, Horesh G et al. Producing polished prokaryotic pangenomes with the Panaroo pipeline. Genome Biol 2020; 21:180 [View Article] [PubMed]
    [Google Scholar]
  25. Zhou Z, Charlesworth J, Achtman M. Accurate reconstruction of bacterial pan- and core genomes with PEPPAN. Genome Res 2020; 30:1667–1679 [View Article]
    [Google Scholar]
  26. Bezanson J, Edelman A, Karpinski S, Shah VB. Julia: a fresh approach to numerical computing. SIAM Rev 2017; 59:65–98 [View Article]
    [Google Scholar]
  27. Wick RR, Schultz MB, Zobel J, Holt KE. Bandage: interactive visualization of de novo genome assemblies. Bioinformatics 2015; 31:3350–3352 [View Article] [PubMed]
    [Google Scholar]
  28. Feng DF, Doolittle RF. Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J Mol Evol 1987; 25:351–360 [View Article] [PubMed]
    [Google Scholar]
  29. Armstrong J, Hickey G, Diekhans M, Fiddes IT, Novak AM et al. Progressive Cactus is a multiple-genome aligner for the thousand-genome era. Nature 2020; 587:246–251 [View Article]
    [Google Scholar]
  30. Saitou N, Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 1987; 4:406–425 [View Article] [PubMed]
    [Google Scholar]
  31. Roberts M, Hayes W, Hunt BR, Mount SM, Yorke JA. Reducing storage requirements for biological sequence comparison. Bioinformatics 2004; 20:3363–3369 [View Article]
    [Google Scholar]
  32. Li H, Birol I. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 2018; 34:3094–3100 [View Article]
    [Google Scholar]
  33. Marçais G, Delcher AL, Phillippy AM, Coston R, Salzberg SL et al. MUMmer4: a fast and versatile genome alignment system. PLoS Comput Biol 2018; 14:e1005944 [View Article] [PubMed]
    [Google Scholar]
  34. Steinegger M, Söding J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol 2017; 35:1026–1028 [View Article] [PubMed]
    [Google Scholar]
  35. Hudson RR. Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 2002; 18:337–338 [View Article] [PubMed]
    [Google Scholar]
  36. O’Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res 2016; 44:D733–D745 [View Article]
    [Google Scholar]
  37. Lees JA, Harris SR, Tonkin-Hill G, Gladstone RA, Lo SW et al. Fast and flexible bacterial genomic epidemiology with PopPUNK. Genome Res 2019; 29:304–316 [View Article]
    [Google Scholar]
  38. Haudiquet M, de Sousa JM, Touchon M, Rocha EPC. Selfish, promiscuous and sometimes useful: how mobile genetic elements drive horizontal gene transfer in microbial populations. Phil Trans R Soc B 2022; 377:1861 [View Article]
    [Google Scholar]
  39. Frost LS, Leplae R, Summers AO, Toussaint A. Mobile genetic elements: the agents of open source evolution. Nat Rev Microbiol 2005; 3:722–732 [View Article] [PubMed]
    [Google Scholar]
  40. van der Zee A, Kraak WB, Burggraaf A, Goessens WHF, Pirovano W et al. Spread of carbapenem resistance by transposition and conjugation among Pseudomonas aeruginosa. Front Microbiol 2018; 9:2057 [View Article] [PubMed]
    [Google Scholar]
  41. Sheppard AE, Stoesser N, Wilson DJ, Sebra R, Kasarskis A et al. Nested Russian doll-like genetic mobility drives rapid dissemination of the carbapenem resistance gene blaKPC. Antimicrob Agents Chemother 2016; 60:3767–3778 [View Article] [PubMed]
    [Google Scholar]
  42. Noll N, Urich E, Wüthrich D, Hinic V, Egli A et al. Resolving structural diversity of Carbapenemase-producing gram-negative bacteria using single molecule sequencing. Microbiology 2018 [View Article] [PubMed]
    [Google Scholar]
  43. Letcher B, Hunt M, Iqbal Z. Gramtools enables multiscale variation analysis with genome graphs. Genome Biol 2021; 22:1–27 [View Article]
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.001034
Loading
/content/journal/mgen/10.1099/mgen.0.001034
Loading

Data & Media loading...

Supplements

Supplementary material 1

PDF
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error