Skip to content
1887

Abstract

Repeats are the most diverse and dynamic but also the least well-understood component of microbial genomes. For all we know, repeat-associated mutations such as duplications, deletions, inversions and gene conversion might be as common as point mutations, but because of short-read myopia and methodological bias, they have received much less attention. Long-read DNA sequencing opens the perspective of resolving repeats and systematically investigating the mutations they induce. For this study, we assembled the genomes of 16 closely related strains of the bacterial pathogen from Pacific Biosciences HiFi reads, with the aim of characterizing the full spectrum of DNA polymorphisms. We found that complete and accurate genomes can be assembled from HiFi reads, with read size being the main limitation in the presence of duplications. By combining a reference-free pangenome graph with extensive repeat annotation, we identified 110 variants, 58 of which could be assigned to repeat-associated mutational mechanisms such as strand slippage and homologous recombination. Whilst recombination events were less frequent than point mutations, they affected large regions and introduced multiple variants at once, as shown by three gene conversion events and a duplication of 7.3 kb that involved and , two genes possibly involved in immune subversion. The vast majority of variants were present in single isolates, such that phylogenetic resolution was only marginally increased when estimating a tree from complete genomes. Our study shows that the contribution of repeat-associated mechanisms of mutation can be similar to that of point mutations at the microevolutionary scale of an outbreak. A large reservoir of unstudied genetic variation in this ‘monomorphic’ bacterial pathogen awaits investigation.

Funding
This study was supported by the:
  • Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (Award CRSII5_213514)
    • Principle Award Recipient: SebastienGagneux
  • Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (Award 320030-227432)
    • Principle Award Recipient: SebastienGagneux
  • European Research Council (Award 883582)
    • Principle Award Recipient: SebastienGagneux
  • This is an open-access article distributed under the terms of the Creative Commons Attribution License.
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.001396
2025-05-01
2025-05-24
Loading full text...

Full text loading...

/deliver/fulltext/mgen/11/5/mgen001396.html?itemId=/content/journal/mgen/10.1099/mgen.0.001396&mimeType=html&fmt=ahah

References

  1. Treangen TJ, Abraham A-L, Touchon M, Rocha EPC. Genesis, effects and fates of repeats in prokaryotic genomes. FEMS Microbiol Rev 2009; 33:539–571 [View Article] [PubMed]
    [Google Scholar]
  2. Darmon E, Leach DRF. Bacterial genome instability. Microbiol Mol Biol Rev 2014; 78:1–39 [View Article] [PubMed]
    [Google Scholar]
  3. Achaz G, Rocha EPC, Netter P, Coissac E. Origin and fate of repeats in bacteria. Nucleic Acids Res 2002; 30:2987–2994 [View Article] [PubMed]
    [Google Scholar]
  4. Schmid M, Frei D, Patrignani A, Schlapbach R, Frey JE et al. Pushing the limits of de novo genome assembly for complex prokaryotic genomes harboring very long, near identical repeats. Nucleic Acids Res 2018; 46:8953–8965 [View Article] [PubMed]
    [Google Scholar]
  5. Garrison E, Guarracino A, Heumos S, Villani F, Bao Z et al. Building pangenome graphs. Nat Methods 2024; 21:2008–2012 [View Article] [PubMed]
    [Google Scholar]
  6. Armstrong J, Hickey G, Diekhans M, Fiddes IT, Novak AM et al. Progressive Cactus is a multiple-genome aligner for the thousand-genome era. Nature 2020; 587:246–251 [View Article] [PubMed]
    [Google Scholar]
  7. Wick RR, Judd LM, Holt KE. Assembling the perfect bacterial genome using Oxford Nanopore and Illumina sequencing. PLoS Comput Biol 2023; 19:e1010905 [View Article] [PubMed]
    [Google Scholar]
  8. Yang Z, Guarracino A, Biggs PJ, Black MA, Ismail N et al. Pangenome graphs in infectious disease: a comprehensive genetic variation analysis of Neisseria meningitidis leveraging Oxford Nanopore long reads. Front Genet 2023; 14:1225248 [View Article] [PubMed]
    [Google Scholar]
  9. Achtman M. Insights from genomic comparisons of genetically monomorphic bacterial pathogens. Philos Trans R Soc Lond B Biol Sci 2012; 367:860–867 [View Article] [PubMed]
    [Google Scholar]
  10. Stritt C, Gagneux S. How do monomorphic bacteria evolve? The Mycobacterium tuberculosis complex and the awkward population genetics of extreme clonality. Peer Community J 2023; 3:e92 [View Article]
    [Google Scholar]
  11. Gagneux S. Ecology and evolution of Mycobacterium tuberculosis. Nat Rev Microbiol 2018; 16:202–213 [View Article] [PubMed]
    [Google Scholar]
  12. Genewein A, Telenti A, Bernasconi C, Schopfer K, Bodmer T et al. Molecular approach to identifying route of transmission of tuberculosis in the community. The Lancet 1993; 342:841–844 [View Article]
    [Google Scholar]
  13. Stucki D, Ballif M, Bodmer T, Coscolla M, Maurer A-M et al. Tracking a tuberculosis outbreak over 21 years: strain-specific single-nucleotide polymorphism typing combined with targeted whole-genome sequencing. J Infect Dis 2015; 211:1306–1316 [View Article] [PubMed]
    [Google Scholar]
  14. Kühnert D, Coscolla M, Brites D, Stucki D, Metcalfe J et al. Tuberculosis outbreak investigation using phylodynamic analysis. Epidemics 2018; 25:47–53 [View Article] [PubMed]
    [Google Scholar]
  15. World Health Organization Global tuberculosis report 2023. World Health Organization; 2023
  16. Gygli SM, Loiseau C, Jugheli L, Adamia N, Trauner A et al. Prisons as ecological drivers of fitness-compensated multidrug-resistant Mycobacterium tuberculosis. Nat Med 2021; 27:1171–1177 [View Article] [PubMed]
    [Google Scholar]
  17. Kozlov AM, Darriba D, Flouri T, Morel B, Stamatakis A. RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference. Bioinformatics 2019; 35:4453–4455 [View Article] [PubMed]
    [Google Scholar]
  18. van Embden JD, Cave MD, Crawford JT, Dale JW, Eisenach KD et al. Strain identification of Mycobacterium tuberculosis by DNA fingerprinting: recommendations for a standardized methodology. J Clin Microbiol 1993; 31:406–409 [View Article] [PubMed]
    [Google Scholar]
  19. Fukasawa Y, Ermini L, Wang H, Carty K, Cheung M-S. LongQC: a quality control tool for third generation sequencing long read data. G3 2020; 10:1193–1196 [View Article]
    [Google Scholar]
  20. Kolmogorov M, Yuan J, Lin Y, Pevzner PA. Assembly of long, error-prone reads using repeat graphs. Nat Biotechnol 2019; 37:540–546 [View Article] [PubMed]
    [Google Scholar]
  21. Hunt M, Silva ND, Otto TD, Parkhill J, Keane JA et al. Circlator: automated circularization of genome assemblies using long sequencing reads. Genome Biol 2015; 16:294 [View Article] [PubMed]
    [Google Scholar]
  22. Coll F, McNerney R, Guerra-Assunção JA, Glynn JR, Perdigão J et al. A robust SNP barcode for typing Mycobacterium tuberculosis complex strains. Nat Commun 2014; 5:4812 [View Article] [PubMed]
    [Google Scholar]
  23. Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 2018; 34:3094–3100 [View Article] [PubMed]
    [Google Scholar]
  24. Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. arXiv 20121207–3907
    [Google Scholar]
  25. Thorvaldsdóttir H, Robinson JT, Mesirov JP. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform 2013; 14:178–192 [View Article] [PubMed]
    [Google Scholar]
  26. Schwengers O, Jelonek L, Dieckmann MA, Beyvers S, Blom J et al. Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification. Microbial Genomics 2021; 7: [View Article]
    [Google Scholar]
  27. Shumate A, Salzberg SL. Liftoff: accurate mapping of gene annotations. Bioinformatics 2021; 37:1639–1643 [View Article] [PubMed]
    [Google Scholar]
  28. Xie Z, Tang H. ISEScan: automated identification of insertion sequence elements in prokaryotic genomes. Bioinformatics 2017; 33:3340–3347 [View Article] [PubMed]
    [Google Scholar]
  29. Pickett BD, Miller JB, Ridge PG. Kmer-SSR: a fast and exhaustive SSR search algorithm. Bioinformatics 2017; 33:3922–3928 [View Article] [PubMed]
    [Google Scholar]
  30. Mori H, Evans-Yamamoto D, Ishiguro S, Tomita M, Yachie N. Fast and global detection of periodic sequence repeats in large genomic resources. Nucleic Acids Res 2019; 47:e8 [View Article] [PubMed]
    [Google Scholar]
  31. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 2010; 26:841–842 [View Article] [PubMed]
    [Google Scholar]
  32. Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M et al. Versatile and open software for comparing large genomes. Genome Biol 2004; 5:1–9 [View Article] [PubMed]
    [Google Scholar]
  33. Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V et al. Twelve years of SAMtools and BCFtools. Gigascience 2021; 10:giab008 [View Article] [PubMed]
    [Google Scholar]
  34. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J et al. BLAST+: architecture and applications. BMC Bioinform 2009; 10:421 [View Article] [PubMed]
    [Google Scholar]
  35. Sonnhammer ELL, Durbin R. A dot-matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysis. Gene 1995; 167:GC1–10 [View Article] [PubMed]
    [Google Scholar]
  36. Hahne F, Ivanek R. Visualizing genomic data using Gviz and Bioconductor. In Statistical Genomics: Methods and Protocols New York: Springer; 2016 [View Article]
    [Google Scholar]
  37. Li H. Protein-to-genome alignment with miniprot. Bioinformatics 2023; 39:btad014 [View Article] [PubMed]
    [Google Scholar]
  38. Paradis E, Schliep K. ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 2019; 35:526–528 [View Article] [PubMed]
    [Google Scholar]
  39. Yu G, Smith DK, Zhu H, Guan Y, Lam TT. GGTREE: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol Evol 2017; 8:28–36 [View Article]
    [Google Scholar]
  40. Karboul A, Gey van Pittius NC, Namouchi A, Vincent V, Sola C et al. Insights into the evolutionary history of tubercle bacilli as disclosed by genetic rearrangements within a PE_PGRS duplicated gene pair. BMC Evol Biol 2006; 6:107 [View Article] [PubMed]
    [Google Scholar]
  41. McEvoy CRE, van Helden PD, Warren RM, Gey van Pittius NC. Evidence for a rapid rate of molecular evolution at the hypervariable and immunogenic Mycobacterium tuberculosis PPE38 gene region. BMC Evol Biol 2009; 9:237 [View Article] [PubMed]
    [Google Scholar]
  42. Uplekar S, Heym B, Friocourt V, Rougemont J, Cole ST. Comparative genomics of Esx genes from clinical isolates of Mycobacterium tuberculosis provides evidence for gene conversion and epitope variation. Infect Immun 2011; 79:4042–4049 [View Article] [PubMed]
    [Google Scholar]
  43. Delogu G, Cole ST, Brosch R. The PE and PPE protein families of Mycobacterium tuberculosis. In Kaufmann SHE, Rubin E, Britton WJ, Helden P. eds Handbook of Tuberculosis Wiley; pp 131–150 [View Article]
    [Google Scholar]
  44. McEvoy CRE, Cloete R, Müller B, Schürch AC, van Helden PD et al. Comparative analysis of Mycobacterium tuberculosis pe and ppe genes reveals high sequence variation and an apparent absence of selective constraints. PLoS ONE 2012; 7:e30593 [View Article] [PubMed]
    [Google Scholar]
  45. Chiner-Oms Á, Sánchez-Busó L, Corander J, Gagneux S, Harris SR et al. Genomic determinants of speciation and spread of the Mycobacterium tuberculosis complex. Sci Adv 2019; 5: [View Article]
    [Google Scholar]
  46. Godfroid M, Dagan T, Kupczok A. Recombination signal in Mycobacterium tuberculosis stems from reference-guided assemblies and alignment artefacts. Genome Biol Evol 2018; 10:1920–1926 [View Article] [PubMed]
    [Google Scholar]
  47. Gupta R, Barkan D, Redelman-Sidi G, Shuman S, Glickman MS. Mycobacteria exploit three genetically distinct DNA double-strand break repair pathways: DSB repair in mycobacterium. Mol Microbiol 2011; 79:316–330 [View Article] [PubMed]
    [Google Scholar]
  48. Shen P, Huang HV. Homologous recombination in Escherichia coli: dependence on substrate length and homology. Genetics 1986; 112:441–457 [View Article] [PubMed]
    [Google Scholar]
  49. Fishbein S, van Wyk N, Warren RM, Sampson SL. Phylogeny to function: PE/PPE protein evolution and impact on Mycobacterium tuberculosis pathogenicity. Mol Microbiol 2015; 96:901–916 [View Article] [PubMed]
    [Google Scholar]
  50. Gey van Pittius NC, Sampson SL, Lee H, Kim Y, van Helden PD et al. Evolution and expansion of the Mycobacterium tuberculosis PE and PPE multigene families and their association with the duplication of the ESAT-6 (esx) gene cluster regions. BMC Evol Biol 2006; 6:95 [View Article] [PubMed]
    [Google Scholar]
  51. Nair S, Ramaswamy PA, Ghosh S, Joshi DC, Pathak N et al. The PPE18 of Mycobacterium tuberculosis interacts with TLR2 and activates IL-10 induction in macrophage. J Immunol 2009; 183:6269–6281 [View Article] [PubMed]
    [Google Scholar]
  52. Xu Y, Yang E, Huang Q, Ni W, Kong C et al. PPE57 induces activation of macrophages and drives Th1-type immune responses through TLR2. J Mol Med 2015; 93:645–662 [View Article] [PubMed]
    [Google Scholar]
  53. Santoyo G, Romero D. Gene conversion and concerted evolution in bacterial genomes. FEMS Microbiol Rev 2005; 29:169–183 [View Article] [PubMed]
    [Google Scholar]
  54. Guo F, Wei J, Song Y, Li B, Qian Z et al. Immunological effects of the PE/PPE family proteins of Mycobacterium tuberculosis and related vaccines. Front Immunol 2023; 14:1255920 [View Article] [PubMed]
    [Google Scholar]
  55. Di Marco F, Spitaleri A, Battaglia S, Batignani V, Cabibbe AM et al. Advantages of long- and short-reads sequencing for the hybrid investigation of the Mycobacterium tuberculosis genome. Front Microbiol 2023; 14:1104456 [View Article] [PubMed]
    [Google Scholar]
  56. Fang B, Edwards SV. Fitness consequences of structural variation inferred from a house finch pangenome. Proc Natl Acad Sci USA 2024; 121:e2409943121 [View Article] [PubMed]
    [Google Scholar]
  57. Modlin SJ, Robinhold C, Morrissey C, Mitchell SN, Ramirez-Busby SM et al. Exact mapping of Illumina blind spots in the Mycobacterium tuberculosis genome reveals platform-wide and workflow-specific biases. Microbial Genomics 2021; 7: [View Article]
    [Google Scholar]
  58. Marin M, Vargas R, Harris M, Jeffrey B, Epperson LE et al. Benchmarking the empirical accuracy of short-read sequencing across the M. tuberculosis genome. Bioinformatics 2022; 38:1781–1787 [View Article] [PubMed]
    [Google Scholar]
  59. Redelings BD, Suchard MA. Incorporating indel information into phylogeny estimation for rapidly emerging pathogens. BMC Evol Biol 2007; 7:40 [View Article] [PubMed]
    [Google Scholar]
  60. Sobkowiak B, Colijn C. Characterising indel diversity in a large Mycobacterium tuberculosis outbreak – implications for transmission reconstruction. Genomics Epub ahead of print 27 October 2022 [View Article]
    [Google Scholar]
  61. Casola C, Hahn MW. Gene conversion among paralogs results in moderate false detection of positive selection using likelihood methods. J Mol Evol 2009; 68:679–687 [View Article] [PubMed]
    [Google Scholar]
  62. Zhang Y, Zhang H, Zhou T, Zhong Y, Jin Q. Genes under positive selection in Mycobacterium tuberculosis. Comput Biol Chem 2011; 35:319–322 [View Article] [PubMed]
    [Google Scholar]
  63. Namouchi A, Karboul A, Fabre M, Gutierrez MC, Mardassi H. Evolution of smooth tubercle Bacilli PE and PE_PGRS genes: evidence for a prominent role of recombination and imprint of positive selection. PLoS ONE 2013; 8:e64718 [View Article] [PubMed]
    [Google Scholar]
  64. Phelan JE, Coll F, Bergval I, Anthony RM, Warren R et al. Recombination in pe/ppe genes contributes to genetic variation in Mycobacterium tuberculosis lineages. BMC Genomics 2016; 17:151 [View Article] [PubMed]
    [Google Scholar]
  65. De Maio N, Shaw LP, Hubbard A, George S, Sanderson ND et al. Comparison of long-read sequencing technologies in the hybrid assembly of complex bacterial genomes. Microbial Genomics 2019; 5: Epub ahead of print 1 September 2019 [View Article]
    [Google Scholar]
  66. Dippenaar A, Goossens SN, Grobbelaar M, Oostvogels S, Cuypers B et al. Nanopore sequencing for Mycobacterium tuberculosis: a critical review of the literature, new developments, and future opportunities. J Clin Microbiol 2022; 60:e0064621 [View Article] [PubMed]
    [Google Scholar]
  67. Tvedte ES, Gasser M, Sparklin BC, Michalski J, Hjelmen CE et al. Comparison of long-read sequencing technologies in interrogating bacteria and fly genomes. G3 Genes 2021; 11:jkab083 [View Article] [PubMed]
    [Google Scholar]
  68. Brosch R, Gordon SV, Buchrieser C, Pym AS, Garnier T et al. Comparative genomics uncovers large tandem chromosomal duplications in Mycobacterium bovis BCG Pasteur. Yeast 2000; 1:111–123 [View Article]
    [Google Scholar]
  69. Domenech P, Kolly GS, Leon-Solis L, Fallow A, Reed MB. Massive gene duplication event among clinical isolates of the Mycobacterium tuberculosis W/Beijing family. J Bacteriol 2010; 192:4562–4570 [View Article] [PubMed]
    [Google Scholar]
  70. Weiner B, Gomez J, Victor TC, Warren RM, Sloutsky A et al. Independent large scale duplications in multiple M. tuberculosis lineages overlapping the same genomic region. PLoS One 2012; 7:e26038 [View Article] [PubMed]
    [Google Scholar]
  71. Wang L, Asare E, Shetty AC, Sanchez-Tumbaco F, Edwards MR et al. Multiple genetic paths including massive gene amplification allow Mycobacterium tuberculosis to overcome loss of ESX-3 secretion system substrates. Proc Natl Acad Sci U S A 2022; 119:e2112608119 [View Article] [PubMed]
    [Google Scholar]
  72. Smith TM, Youngblom MA, Kernien JF, Mohamed MA, Fry SS et al. Rapid adaptation of a complex trait during experimental evolution of Mycobacterium tuberculosis. Elife 2022; 11:e78454 [View Article] [PubMed]
    [Google Scholar]
/content/journal/mgen/10.1099/mgen.0.001396
Loading
/content/journal/mgen/10.1099/mgen.0.001396
Loading

Data & Media loading...

Supplements

Supplementary material 1

PDF

Supplementary material 2

EXCEL

Supplementary material 3

EXCEL
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error