1887

Abstract

The occurrence of multiple strains of a bacterial pathogen such as or within a single human host, referred to as a mixed infection, has important implications for both healthcare and public health. However, methods for detecting it, and especially determining the proportion and identities of the underlying strains, from WGS (whole-genome sequencing) data, have been limited. In this paper we introduce , a novel method for addressing these challenges. Grounded in a rigorous statistical model, not only demonstrates superior performance in proportion estimation to other existing methods on both simulated as well as real data, but also successfully determines the identity of the underlying strains. We conclude that is a powerful addition to the existing toolkit of analytical methods for data coming from bacterial pathogens and holds the promise of enabling previously inaccessible conclusions to be drawn in the realm of public health microbiology.

Funding
This study was supported by the:
  • Natural Sciences and Engineering Research Council of Canada (Award Discovery)
    • Principle Award Recipient: MaxwellLibbrecht
  • MRC Centre for Global Infectious Disease Analysis (Award MR/R015600/1)
    • Principle Award Recipient: LeonidChindelevitch
  • Alfred P. Sloan Foundation (Award FG-2016-6392)
    • Principle Award Recipient: LeonidChindelevitch
  • Genome Canada (Award Machine Learning Methods to Predict Drug Resistance in Pathogenic Bacteria)
    • Principle Award Recipient: LeonidChindelevitch
  • CANSSI (Award Statistical methods for challenging problems in public health microbiology)
    • Principle Award Recipient: LeonidChindelevitch
  • This is an open-access article distributed under the terms of the Creative Commons Attribution License. This article was made open access via a Publish and Read agreement between the Microbiology Society and the corresponding author’s institution.
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.000607
2021-06-24
2024-11-06
Loading full text...

Full text loading...

/deliver/fulltext/mgen/7/6/mgen000607.html?itemId=/content/journal/mgen/10.1099/mgen.0.000607&mimeType=html&fmt=ahah

References

  1. Cohen T, Helden PD, Wilson D, Colijn C, McLaughlin MM et al. Mixed-strain Mycobacterium tuberculosis infections and the implications for tuberculosis treatment and control. Clin Microbiol Rev 2012; 25:708–719 [View Article]
    [Google Scholar]
  2. Eyre DW, Cule ML, Griffiths D, Crook DW, Peto TEA et al. Detection of mixed infection from bacterial whole genome sequence data allows assessment of its role in Clostridium difficile transmission. PLoS Comput Biol 2013; 9:e1003059 [View Article]
    [Google Scholar]
  3. Sha J, Almagro G, Mc V. Deconvolution of multiple infections in Plasmodium falciparum from high throughput sequencing data. Bioinformatics 2017; 34:9–15
    [Google Scholar]
  4. Nathavitharana RR, Shi CX, Chindelevitch L, Calderon R, Zhang Z et al. Polyclonal pulmonary Tuberculosis infections and risk for multidrug resistance, LIMA, Peru. Emerg Infect Dis 2017; 23:1887–1890 [View Article]
    [Google Scholar]
  5. Sergeev R, Colijn C, Cohen T. Models to understand the population-level impact of mixed strain M. Tuberculosis infections. J Theor Biol 2011; 280:88–100 [View Article]
    [Google Scholar]
  6. Weiss S, David S, Victor I. Heteroresistance: A cause of unexplained antibiotic treatment failure?. PLOS Pathogens 2019; 15:1–7
    [Google Scholar]
  7. Zong Z, Huo F, Shi J, Jing W, Ma Y et al. Relapse versus reinfection of recurrent tuberculosis patients in a national Tuberculosis specialized hospital in Beijing, China. Front microbiol 2018; 9:1858 [View Article]
    [Google Scholar]
  8. Nadon CA, Trees E, Ng L, Møller E, Reimer A et al. Development and application of MLVA methods as a tool for inter-laboratory surveillance. Euro Surveill 2013; 18:
    [Google Scholar]
  9. Leonid C, Colijn C, Moodley P, Wilson D, Cohen T et al. ClassTr: Classifying within-host heterogeneity based on tandem repeats with application to Mycobacterium tuberculosis infections. PLOS Computational Biology 2016; 12:1–16 [PubMed]
    [Google Scholar]
  10. Sobkowiak B, Glynn JR, Houben RMGJ, Mallard K, Phelan JE et al. Identifying mixed Mycobacterium tuberculosis infections from whole genome sequence data. BMC Genomics 2018; 19:613 [View Article]
    [Google Scholar]
  11. Anyansi C, Keo A, Walker BJ, Straub TJ, Manson AL et al. QuantTB - a method to classify mixed Mycobacterium tuberculosis infections within whole genome sequencing data. BMC genomics 2020; 21:80 [View Article]
    [Google Scholar]
  12. O’Leary NA, Wright M, Brister R, Ciufo S, Haddad D et al. Reference sequence (Refseq) database at NCBI: Current status, taxonomic expansion, and functional annotation. Nucleic Acids Res 2016; 44:D733–D745
    [Google Scholar]
  13. Moon TK. The expectation-maximization algorithm. IEEE Signal Process Mag 1996; 13:47–60
    [Google Scholar]
  14. Feijao P, Yao H-T, Fornika D, Gardy J, Hsiao W et al. MentaLiST - A fast MLST caller for large MLST schemes. Microb Genom 2018; 4: [View Article]
    [Google Scholar]
  15. De J, Michael A, Gerrick E, Xu W, Park S et al. Comprehensive essentiality analysis of the Mycobacterium tuberculosis genome via saturating transposon mutagenesis. MBio 2017; 8:16–e02133
    [Google Scholar]
  16. Huang W, Li L, Myers JR, Marth GT. Art: A next-generation sequencing read simulator. In Bioinformatics Vol 28 Oxford University Press; 2012 2012 pp 593–594 [View Article]
    [Google Scholar]
  17. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv 20131303.3997
    [Google Scholar]
  18. Bolger A, Lohse M, Usadel B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 2014; 30:2114–2120
    [Google Scholar]
  19. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J et al. The sequence Alignment/map format and samtools. Bioinformatics 2009; 25:2078–2079 [View Article]
    [Google Scholar]
  20. Comas I, Jaidip C, Peter MS, James G, Stefan N et al. Human T cell epitopes of Mycobacterium tuberculosis are evolutionarily hyperconserved. Nat Genet 2010; 42:498–503 [View Article]
    [Google Scholar]
  21. Virtanen P, Travis O, Matt H, Tyler R, David C et al. SCIPY 1.0: Fundamental algorithms for scientific computing in Python. Nat Methods 2020; 17:261–272
    [Google Scholar]
  22. Zignol M, Andrea C, Anna SD, Philippe G, Natavan A et al. Genetic sequencing for surveillance of drug resistance in tuberculosis in highly endemic countries: A multi-country population-based surveillance study. Lancet Infect Dis 2018; 18:675–683 [View Article]
    [Google Scholar]
  23. J A, Crampin AC, Houben RM, Mzembe T, Mallard K et al. Large-scale whole genome sequencing of M. Tuberculosis provides insights into transmission in a high prevalence area. Guerra-assunção. elife 2015; 4:e05166
    [Google Scholar]
  24. Forouzan E, Parvin S, Masoumeh SM, Karkhane AA, Yakhchali B et al. Practical evaluation of 11 de novo assemblers in metagenome assembly. J Microbiol Met 2018; 151:99–105 [View Article]
    [Google Scholar]
  25. Wajid B, Serpedin E. Review of general algorithmic features for genome assemblers for Next Generation sequencers. Genomics, Proteomics & Bioinformatics 2012; 10:58–73
    [Google Scholar]
  26. Goig GA, Silvia B, Alberto L, Garcia B, Iñaki C et al. Contaminant dna in bacterial sequencing experiments is a major source of false genetic variability. BMC Biol 2020; 18:1–15
    [Google Scholar]
  27. Cole S, Churcher C, Parkhill J, Garnier T, Harris D et al. Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature 1998; 396:190 [View Article]
    [Google Scholar]
  28. Chiner-Oms Á, Sánchez-Busó L, Corander J, Gagneux S, Harris SR et al. Genomic determinants of speciation and spread of the Mycobacterium tuberculosis complex. Sci Adv 2019; 5:eaaw3307 [View Article]
    [Google Scholar]
  29. Achtman M. Insights from genomic comparisons of genetically monomorphic bacterial pathogens. In The Royal Society, 2012, Philosophical Transactions of the Royal Society B: Biological Sciences Vol 367 2012 pp 860–867
    [Google Scholar]
  30. Didelot X, Walker AS, Peto TE, Crook DW, Wilson DJ et al. Within-host evolution of bacterial pathogens. In Nature Reviews Microbiology Vol 14 Nature Publishing Group; 2016 pp 150–162 [View Article]
    [Google Scholar]
  31. Vos M, Didelot X. A comparison of homologous recombination rates in bacteria and archaea. ISME J 2009; 3:199–208
    [Google Scholar]
  32. Didelot X, Wilson DJ. Clonalframeml: Efficient inference of recombination in whole bacterial genomes. PLoS Comput Biol 2015; 11:e1004041
    [Google Scholar]
  33. Holley G, Melsted P. Bifrost – highly parallel construction and indexing of colored and compacted de Bruijn graphs. bioRxiv 2019
    [Google Scholar]
  34. Colman RE, Schupp JM, Hicks ND, Smith DE, Buchhagen JL et al. Detection of low-level mixed-population drug resistance in Mycobacterium tuberculosis using high fidelity amplicon sequencing. PLoS One 2015; 10:e0126626 [View Article]
    [Google Scholar]
  35. Katebi M. In Pathogist: a Novel Method for Clustering Pathogen Isolates by Combining Multiple Genotyping Signals Simon Fraser University; 2019
    [Google Scholar]
  36. Zabeti H. An Interpretable Classification Method for Predicting Drug Resistance in M. tuberculosis Cold Spring Harbor Laboratory, bioRxiv; 2020
    [Google Scholar]
/content/journal/mgen/10.1099/mgen.0.000607
Loading
/content/journal/mgen/10.1099/mgen.0.000607
Loading

Data & Media loading...

Supplements

Loading data from figshare Loading data from figshare
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error