1887

Abstract

Improved understanding of the genomic variants that allow () to acquire drug resistance, or tolerance, and increase its virulence are important factors in controlling the current tuberculosis epidemic. Current approaches to sequencing, however, cannot reveal ’s full genomic diversity due to the strict requirements of low contamination levels, high sequence coverage and elimination of complex regions. We have developed the XBS (compleX Bacterial Samples) bioinformatics pipeline, which implements joint calling and machine-learning-based variant filtering tools to specifically improve variant detection in the important samples that do not meet these criteria, such as those from unbiased sputum samples. Using novel simulated datasets, which permit exact accuracy verification, XBS was compared to the UVP and MTBseq pipelines. Accuracy statistics showed that all three pipelines performed equally well for sequence data that resemble those obtained from culture isolates of high depth of coverage and low-level contamination. In the complex genomic regions, however, XBS accurately identified 9.0 % more SNPs and 8.1 % more single nucleotide insertions and deletions than the WHO-endorsed unified analysis variant pipeline. XBS also had superior accuracy for sequence data that resemble those obtained directly from sputum samples, where depth of coverage is typically very low and contamination levels are high. XBS was the only pipeline not affected by low depth of coverage (5–10×), type of contamination and excessive contamination levels (>50 %). Simulation results were confirmed using whole genome sequencing (WGS) data from clinical samples, confirming the superior performance of XBS with a higher sensitivity (98.8%) when analysing culture isolates and identification of 13.9 % more variable sites in WGS data from sputum samples as compared to MTBseq, without evidence for false positive variants when rRNA regions were excluded. The XBS pipeline facilitates sequencing of less-than-perfect samples. These advances will benefit future clinical applications of sequencing, especially WGS directly from clinical specimens, thereby avoiding biases and making many more samples available for drug resistance and other genomic analyses. The additional genetic resolution and increased sample success rate will improve genome-wide association studies and sequence-based transmission studies.

Funding
This study was supported by the:
  • fonds wetenschappelijk onderzoek (Award G0F8316N)
    • Principle Award Recipient: AnneliesVan Rie
  • This is an open-access article distributed under the terms of the Creative Commons Attribution NonCommercial License.
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.000689
2021-11-18
2024-06-24
Loading full text...

Full text loading...

/deliver/fulltext/mgen/7/11/mgen000689.html?itemId=/content/journal/mgen/10.1099/mgen.0.000689&mimeType=html&fmt=ahah

References

  1. Meehan CJ, Goig GA, Kohl TA. Whole genome sequencing of Mycobacterium tuberculosis: current standards and open issues. Nat Rev Microbiol 2019; 17:533–545 [View Article] [PubMed]
    [Google Scholar]
  2. McClean M, Stanley T, Stanley S. Identification and characterization of breakthrough contaminants associated with the conventional isolation of Mycobacterium tuberculosis. J Med Microbiol 2011; 60:1292–1298 [View Article] [PubMed]
    [Google Scholar]
  3. Rachow A, Saathoff E, Mtafya B, Mapamba D, Mangu C et al. The impact of repeated NALC/NaOH- decontamination on the performance of Xpert MTB/RIF assay. Tuberculosis 2018; 110:56–58 [View Article] [PubMed]
    [Google Scholar]
  4. Farmanfarmaei G, Kamakoli MK, Sadegh HR. Bias in detection of Mycobacterium tuberculosis polyclonal infection: Use clinical samples or cultures?. Mol Cell Probes 2017; 33:1–3 [View Article] [PubMed]
    [Google Scholar]
  5. Nimmo C, Shaw LP, Doyle R. Whole genome sequencing Mycobacterium tuberculosis directly from sputum identifies more genetic diversity than sequencing from culture. BMC Genomics 2019; 20:389 [View Article] [PubMed]
    [Google Scholar]
  6. Ezewudo M, Borens A, Chiner-Oms Á. tegrating standardized whole genome sequence analysis with a global Mycobacterium tuberculosis antibiotic resistance knowledgebase. Sci Rep 2018; 8:1–10 DOI: 10.1038/s41598-018-33731-1
    [Google Scholar]
  7. Lander ES, Waterman MS. Genomic mapping by fingerprinting random clones: a mathematical analysis. Genomics 1988; 2:231–239 [View Article] [PubMed]
    [Google Scholar]
  8. Kohl TA, Diel R, Harmsen D. Whole-genome-based Mycobacterium tuberculosis surveillance: a standardized, portable, and expandable approach. J Clin Microbiol 2014; 52:2479–2486 [View Article] [PubMed]
    [Google Scholar]
  9. Goig GA, Cancino-Muñoz I, Torres-Puente M. Whole-genome sequencing of Mycobacterium tuberculosis directly from clinical samples for high-resolution genomic epidemiology and drug resistance surveillance: an observational study. The Lancet Microbe 2020; 1:e175–e183 [View Article]
    [Google Scholar]
  10. Kato-Maeda M, Ho C, Passarelli B. Use of whole genome sequencing to determine the microevolution of Mycobacterium tuberculosis during an outbreak. PLoS One 2013; 8:e58235 [View Article] [PubMed]
    [Google Scholar]
  11. Poplin R, Ruano-Rubio V, DePristo MA et al. Scaling accurate genetic variant discovery to tens of thousands of samples. BioRxiv 2017201178
    [Google Scholar]
  12. DePristo MA, Banks E, Poplin R. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 2011; 43:491 [View Article] [PubMed]
    [Google Scholar]
  13. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv Prepr 2013; 3997:
    [Google Scholar]
  14. Davis S, Pettengill JB, Luo Y et al. CFSAN SNP pipeline: An automated method for constructing SNP matrices from next-generation sequence data. PeerJ Comput Sci 2015; 1:e20 [View Article]
    [Google Scholar]
  15. Huang W, Li L, Myers JR. ART: a next-generation sequencing read simulator. Bioinformatics 2012; 28:593–594 [View Article] [PubMed]
    [Google Scholar]
  16. Kohl TA, Utpatel C, Schleusener V. MTBseq: a comprehensive pipeline for whole genome sequence analysis of Mycobacterium tuberculosis complex isolates. PeerJ 2018; 6:e5895 [View Article] [PubMed]
    [Google Scholar]
  17. Roetzer A, Diel R, Kohl TA. Whole genome sequencing versus traditional genotyping for investigation of a Mycobacterium tuberculosis outbreak: a longitudinal molecular epidemiological study. PLoS Med 2013; 10:e1001387 [View Article] [PubMed]
    [Google Scholar]
  18. Coll F, McNerney R, Guerra-Assunção JA et al. A robust SNP barcode for typing Mycobacterium tuberculosis complex strains. Nat Commun 2014; 5:1–5 [View Article]
    [Google Scholar]
  19. Coll F, Phelan J, Hill-Cawthorne GA. Genome-wide analysis of multi-and extensively drug-resistant Mycobacterium tuberculosis. Nat Genet 2018; 50:307–316 [View Article] [PubMed]
    [Google Scholar]
  20. Napier G, Campino S, Merid Y et al. Robust barcoding and identification of Mycobacterium tuberculosis lineages for epidemiological and clinical studies. Genome Med 2020; 12:1–10 [View Article]
    [Google Scholar]
  21. Minh BQ, Schmidt HA, Chernomor O. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol 2020; 37:1530–1534 [View Article] [PubMed]
    [Google Scholar]
  22. N.d Rambaut A. FigTree https://github.com/rambaut/figtree/
    [Google Scholar]
  23. Goig GA, Blanco S, Garcia-Basteiro AL et al. Contaminant DNA in bacterial sequencing experiments is a major source of false genetic variability. BMC Biol 2020; 18:1–15
    [Google Scholar]
  24. Schleusener V, Köser CU, Beckert P et al. Mycobacterium tuberculosis resistance prediction and lineage classification from genome sequencing: Comparison of automated analysis tools. Sci Rep 2017; 7:1–9 [View Article]
    [Google Scholar]
  25. Jajou R, Kohl TA, Walker T. Towards standardisation: Comparison of five whole genome sequencing (WGS) analysis pipelines for detection of epidemiologically linked tuberculosis cases. Euro Surveill 2019; 24: [View Article] [PubMed]
    [Google Scholar]
  26. Nikolayevskyy V, Kranzer K, Niemann S. Whole genome sequencing of Mycobacterium tuberculosis for detection of recent transmission and tracing outbreaks: a systematic review. Tuberculosis 2016; 98:77–85 [View Article] [PubMed]
    [Google Scholar]
  27. Meehan CJ, Moris P, Kohl TA. The relationship between transmission time and clustering methods in Mycobacterium tuberculosis epidemiology. EBioMedicine 2018; 37:410–416 [View Article] [PubMed]
    [Google Scholar]
  28. Walter KS, Colijn C, Cohen T et al. Genomic variant-identification methods may alter Mycobacterium tuberculosis transmission inferences. Microb Genomics 2020; 6: [View Article]
    [Google Scholar]
  29. Anyansi C, Keo A, Walker BJ. QuantTB--A method to classify mixed Mycobacterium tuberculosis infections within whole genome sequencing data. BMC Genomics 2020; 21:80 [View Article] [PubMed]
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.000689
Loading
/content/journal/mgen/10.1099/mgen.0.000689
Loading

Data & Media loading...

Supplements

Loading data from figshare Loading data from figshare
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error