1887

Abstract

Pathogen genomic data are increasingly used to characterize global and local transmission patterns of important human pathogens and to inform public health interventions. Yet, there is no current consensus on how to measure genomic variation. To test the effect of the variant-identification approach on transmission inferences for we conducted an experiment in which five genomic epidemiology groups applied variant-identification pipelines to the same outbreak sequence data. We compared the variants identified by each group in addition to transmission and phylogenetic inferences made with each variant set. To measure the performance of commonly used variant-identification tools, we simulated an outbreak. We compared the performance of three mapping algorithms, five variant callers and two variant filters in recovering true outbreak variants. Finally, we investigated the effect of applying increasingly stringent filters on transmission inferences and phylogenies. We found that variant-calling approaches used by different groups do not recover consistent sets of variants, which can lead to conflicting transmission inferences. Further, performance in recovering true variation varied widely across approaches. While no single variant-identification approach outperforms others in both recovering true genome-wide and outbreak-level variation, variant-identification algorithms calibrated upon real sequence data or that incorporate local reassembly outperform others in recovering true pairwise differences between isolates. The choice of variant filters contributed to extensive differences across pipelines, and applying increasingly stringent filters rapidly eroded the accuracy of transmission inferences and quality of phylogenies reconstructed from outbreak variation. Commonly used approaches to identify genomic variation have variable performance, particularly when predicting potential transmission links from pairwise genetic distances. Phylogenetic reconstruction may be improved by less stringent variant filtering. Approaches that improve variant identification in repetitive, hypervariable regions, such as long-read assemblies, may improve transmission inference.

Funding
This study was supported by the:
  • Jason R. Andrews , National Institute of Allergy and Infectious Diseases , (Award R01 AI130058)
  • Katharine S. Walter , Stanford University, Maternal and Child Health Research Institute (US) , (Award Postdoctoral Support Award)
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.000418
2020-07-31
2020-08-06
Loading full text...

Full text loading...

/deliver/fulltext/mgen/10.1099/mgen.0.000418/mgen000418.html?itemId=/content/journal/mgen/10.1099/mgen.0.000418&mimeType=html&fmt=ahah

References

  1. Roetzer A, Diel R, Kohl TA, Rückert C, Nübel U et al. Whole genome sequencing versus traditional genotyping for investigation of a Mycobacterium tuberculosis outbreak: a longitudinal molecular epidemiological study. PLoS Med 2013; 10:e1001387 [CrossRef][PubMed]
    [Google Scholar]
  2. Campbell F, Strang C, Ferguson N, Cori A, Jombart T. When are pathogen genome sequences informative of transmission events?. PLoS Pathog 2018; 14:e1006885 [CrossRef][PubMed]
    [Google Scholar]
  3. Churchyard G, Kim P, Shah NS, Rustomjee R, Gandhi N et al. What we know about tuberculosis transmission: an overview. vol. 216, Journal of infectious diseases. Oxford University Press 2017S629–635
    [Google Scholar]
  4. Correia Sacchi FP, Tatara MB, Camioli de Lima C, Ferreia da Silva L, Cunha EA et al. Genetic clustering of tuberculosis in an Indigenous community of Brazil. Am J Trop Med Hyg 2018; 98:372–375 [CrossRef][PubMed]
    [Google Scholar]
  5. Warren JL, Grandjean L, Moore DAJ, Lithgow A, Coronel J et al. Investigating spillover of multidrug-resistant tuberculosis from a prison: a spatial and molecular epidemiological analysis. BMC Med 2018; 16:122 [CrossRef][PubMed]
    [Google Scholar]
  6. Perdigão J, Clemente S, Ramos J, Masakidi P, Machado D et al. Genetic diversity, transmission dynamics and drug resistance of Mycobacterium tuberculosis in Angola. Sci Rep 2017; 7:42814 [CrossRef][PubMed]
    [Google Scholar]
  7. PHE Tuberculosis in England: 2018 presenting data to end of 2017. Public Heal Engl 2018; Version 1:173
    [Google Scholar]
  8. Walker TM, Lalor MK, Broda A, Ortega LS, Morgan M et al. Assessment of Mycobacterium tuberculosis transmission in Oxfordshire, UK, 2007-12, with whole pathogen genome sequences: an observational study. Lancet Respir Med 2014; 2:285–292 [CrossRef][PubMed]
    [Google Scholar]
  9. Bryant JM, Schürch AC, van Deutekom H, Harris SR, de Beer JL et al. Inferring patient to patient transmission of Mycobacterium tuberculosis from whole genome sequencing data. BMC Infect Dis 2013; 13:1–12
    [Google Scholar]
  10. Guerra-Assunção J, Crampin A, Houben R, Mzembe T, Mallard K et al. Large scale population-based whole genome sequencing of Mycobacterium tuberculosis provides insights into transmission in a high prevalence area. Elife 20141–17
    [Google Scholar]
  11. Mathema B, Kurepina NE, Bifani PJ, Kreiswirth BN. Molecular epidemiology of tuberculosis: current insights. Clin Microbiol Rev 2006; 19:658–685 [CrossRef][PubMed]
    [Google Scholar]
  12. Holt KE, McAdam P, Thai PVK, Thuong NTT, Ha DTM et al. Frequent transmission of the Mycobacterium tuberculosis Beijing lineage and positive selection for the EsxW Beijing variant in Vietnam. Nat Genet 2018; 50:849–856 [CrossRef][PubMed]
    [Google Scholar]
  13. Didelot X, Fraser C, Gardy J, Colijn C, Malik H. Genomic infectious disease epidemiology in partially sampled and ongoing outbreaks. Mol Biol Evol 2017; 34:997–1007 [CrossRef][PubMed]
    [Google Scholar]
  14. Jombart T, Cori A, Didelot X, Cauchemez S, Fraser C et al. Bayesian reconstruction of disease outbreaks by combining epidemiologic and genomic data. PLoS Comput Biol 2014; 10:e1003457 [CrossRef][PubMed]
    [Google Scholar]
  15. Ayabina D, Ronning JO, Alfsnes K, Debech N, Brynildsrud OB et al. Genome-Based transmission modelling separates imported tuberculosis from recent transmission within an immigrant population. Microb Genom 2018; 4:1–13 [CrossRef][PubMed]
    [Google Scholar]
  16. Yang C, Lu L, Warren JL, Wu J, Jiang Q et al. Internal migration and transmission dynamics of tuberculosis in Shanghai, China: an epidemiological, spatial, genomic analysis. Lancet Infect Dis 2018; 18:788–795 [CrossRef][PubMed]
    [Google Scholar]
  17. Barnes PF, Cave MD. Molecular epidemiology of tuberculosis. N Engl J Med 2003; 349:1149–1156 [CrossRef][PubMed]
    [Google Scholar]
  18. Shah NS, Auld SC, Brust JCM, Mathema B, Ismail N et al. Transmission of extensively drug-resistant tuberculosis in South Africa. N Engl J Med 2017; 376:243–253 [CrossRef][PubMed]
    [Google Scholar]
  19. Hatherell H-A, Didelot X, Pollock SL, Tang P, Crisan A et al. Declaring a tuberculosis outbreak over with genomic epidemiology. Microb Genom 2016; 2:e000060 [CrossRef][PubMed]
    [Google Scholar]
  20. Meehan CJ, Goig GA, Kohl TA, Verboven L, Dippenaar A et al. Whole genome sequencing of Mycobacterium tuberculosis: current standards and open issues. Nat Rev Microbiol 2019; 17:533–545 [CrossRef][PubMed]
    [Google Scholar]
  21. Bradley P, Gordon NC, Walker TM, Dunn L, Heys S et al. Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis. Nat Commun 2015; 6:10063 [CrossRef][PubMed]
    [Google Scholar]
  22. Ezewudo M, Borens A, Á C-O, Miotto P, Chindelevitch L et al. Integrating standardized whole genome sequence analysis with a global Mycobacterium tuberculosis antibiotic resistance knowledgebase. Sci Rep 2018; 8:1–10
    [Google Scholar]
  23. Walker TM, Kohl TA, Omar SV, Hedge J, Del Ojo Elias C et al. Whole-genome sequencing for prediction of Mycobacterium tuberculosis drug susceptibility and resistance: a retrospective cohort study. Lancet Infect Dis 2015; 15:1193–1202 [CrossRef][PubMed]
    [Google Scholar]
  24. Coll F, McNerney R, Preston MD, Guerra-Assunção JA, Warry A et al. Rapid determination of anti-tuberculosis drug resistance from whole-genome sequences. Genome Med 2015; 7:51 [CrossRef][PubMed]
    [Google Scholar]
  25. Kohl TA, Utpatel C, Schleusener V, De Filippo MR, Beckert P et al. MTBseq: a comprehensive pipeline for whole genome sequence analysis of Mycobacterium tuberculosis complex isolates. PeerJ 2018; 6:e5895 [CrossRef][PubMed]
    [Google Scholar]
  26. Feuerriegel S, Schleusener V, Beckert P, Kohl TA, Miotto P et al. PhyResSE: a web tool delineating Mycobacterium tuberculosis antibiotic resistance and lineage from whole-genome sequencing data. J Clin Microbiol 2015; 53:1908 [CrossRef][PubMed]
    [Google Scholar]
  27. Ford CB, Shah RR, Maeda MK, Gagneux S, Murray MB et al. Mycobacterium tuberculosis mutation rate estimates from different lineages predict substantial differences in the emergence of drug-resistant tuberculosis. Nat Genet 2013; 45:784–790 [CrossRef][PubMed]
    [Google Scholar]
  28. Liu Q, Ma A, Wei L, Pang Y, Wu B et al. China’s tuberculosis epidemic stems from historical expansion of four strains of Mycobacterium tuberculosis. Nat Ecol Evol 2018; 2:
    [Google Scholar]
  29. Koster KJ, Largen A, Foster JT, Drees KP, Qian L et al. Genomic sequencing is required for identification of tuberculosis transmission in Hawaii. BMC Infect Dis 2018; 18:1–14
    [Google Scholar]
  30. Ektefaie Y, Dixit A, Freschi L, Farhat MR. Tuberculosis resistance acquisition in space and time: an analysis of globally diverse M. tuberculosis whole genome sequences. bioRxiv 2019; 837096:
    [Google Scholar]
  31. Hatherell HA, Colijn C, Stagg HR, Jackson C, Winter JR et al. Interpreting whole genome sequencing for investigating tuberculosis transmission: a systematic review. BMC Med 2016; 14:1–13
    [Google Scholar]
  32. Stimson J, Gardy J, Mathema B, Crudu V, Cohen T et al. Beyond the SNP threshold: identifying outbreak clusters using inferred transmissions. Mol Biol Evol 2019; 36:587–603 [CrossRef][PubMed]
    [Google Scholar]
  33. Walker TM, Ip CLC, Harrell RH, Evans JT, Kapatai G et al. Whole-Genome sequencing to delineate Mycobacterium tuberculosis outbreaks: a retrospective observational study. Lancet Infect Dis 2013; 13:137–146 [CrossRef][PubMed]
    [Google Scholar]
  34. Pattengale ND, Alipour M, Bininda-Emonds ORP, Moret BME, Stamatakis A. How many bootstrap replicates are necessary?. J Comput Biol 2010; 17:337–354 [CrossRef][PubMed]
    [Google Scholar]
  35. Zook J, McDaniel J, Parikh H, Heaton H, Irvine SA et al. Reproducible integration of multiple sequencing datasets to form high-confidence SNP, indel, and reference calls for five human genome reference materials. bioRxiv 2018; 281006:
    [Google Scholar]
  36. Krusche P, Trigg L, Boutros PC, Mason CE, De La Vega FM et al. Best practices for benchmarking germline small-variant calls in human genomes. Nat Biotechnol 2019; 37:555–560 [CrossRef][PubMed]
    [Google Scholar]
  37. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009; 25:1754–1760 [CrossRef][PubMed]
    [Google Scholar]
  38. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods 2012; 9:357–359 [CrossRef][PubMed]
    [Google Scholar]
  39. Genome Research Ltd SMALT.; 2015
  40. Deatherage DE, Barrick JE. Identification of mutations in laboratory-evolved microbes from next-generation sequencing data using breseq. Methods Mol Biol 2014; 1151:165–188 [CrossRef][PubMed]
    [Google Scholar]
  41. Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 2014; 9:e112963 [CrossRef][PubMed]
    [Google Scholar]
  42. Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, del Angel G et al. From fastq data to high-confidence variant calls: the genome analysis toolkit best practices pipeline. Current Protocols in Bioinformatics Hoboken, NJ, USA: John Wiley & Sons, Inc; 2013 pp 11.10.1–11.1011
    [Google Scholar]
  43. Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 2011; 27:2987–2993 [CrossRef][PubMed]
    [Google Scholar]
  44. Poplin R, Chang PC, Alexander D, Schwartz S, Colthurst T et al. A Universal SNP and Small-Indel Variant Caller Using Deep Neural Networks 36 Nature Biotechnology. Nature Publishing Group; 2018 p 983
    [Google Scholar]
  45. Phelan JE, Coll F, Bergval I, Anthony RM, Warren R et al. Recombination in pe/ppe genes contributes to genetic variation in Mycobacterium tuberculosis lineages. BMC Genomics 2016; 17:1–12
    [Google Scholar]
  46. Jajou R, Kohl TA, Walker T, Norman A, Cirillo DM et al. Towards standardisation: comparison of five whole genome sequencing (WGS) analysis pipelines for detection of epidemiologically linked tuberculosis cases. Euro Surveill 2019; 24: [CrossRef][PubMed]
    [Google Scholar]
  47. Lee RS, Behr MA. Does choice matter? reference-based alignment for molecular epidemiology of tuberculosis. J Clin Microbiol 2016; 54:1891–1895 [CrossRef][PubMed]
    [Google Scholar]
  48. Periwal V, Patowary A, Vellarikkal SK, Gupta A, Singh M et al. Comparative whole-genome analysis of clinical isolates reveals characteristic architecture of Mycobacterium tuberculosis pangenome. PLoS One 2015; 10:e0122979 [CrossRef][PubMed]
    [Google Scholar]
  49. Goig GA, Blanco S, Garcia-Basteiro AL, Comas I. Contaminant DNA in bacterial sequencing experiments is a major source of false genetic variability. bioRxiv. 2019; 403824:
    [Google Scholar]
  50. Ren Y, Reddy JS, Pottier C, Sarangi V, Tian S et al. Identification of missing variants by combining multiple analytic pipelines. BMC Bioinformatics 2018; 19:1–12
    [Google Scholar]
  51. Nikolayevskyy V, Niemann S, Anthony R, van Soolingen D, Tagliani E et al. Role and value of whole genome sequencing in studying tuberculosis transmission. Clin Microbiol Infect 2019; 25:1377-1382 [CrossRef][PubMed]
    [Google Scholar]
  52. Menardo F, Duchêne S, Brites D, Gagneux S. The molecular clock of Mycobacterium tuberculosis. PLoS Pathog 2019; 15:e1008067 [CrossRef][PubMed]
    [Google Scholar]
  53. Paradis E, Schliep K. Ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 2019; 35:526–528 [CrossRef][PubMed]
    [Google Scholar]
  54. Kozlov AM, Darriba D, Flouri T, Morel B, Stamatakis A. RAxML-NG: a fast, scalable, and user-friendly tool for maximum likelihood phylogenetic inference. Wren J, editor. Bioinformatics 2019
    [Google Scholar]
  55. Robinson DF, Foulds LR. Comparison of phylogenetic trees. Math Biosci 1981; 53:131–147
    [Google Scholar]
  56. Jombart T, Kendall M, Almagro-Garcia J, Colijn C. treespace: statistical exploration of landscapes of phylogenetic trees. Mol Ecol Resour 2017; 17:1385–1392 [CrossRef][PubMed]
    [Google Scholar]
  57. Huang W, Li L, Myers JR, Marth GT. Art: a next-generation sequencing read simulator. Bioinformatics 2012; 28:593–594 [CrossRef][PubMed]
    [Google Scholar]
  58. Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M et al. Versatile and open software for comparing large genomes. Genome Biol 2004; 5:R12 [CrossRef][PubMed]
    [Google Scholar]
  59. Zook JM, Chapman B, Wang J, Mittelman D, Hofmann O et al. Integrating sequencing datasets to form highly confident SNP and indel genotype calls for a whole human genome; 2014
  60. Krusche P. Haplotype VCF comparison tools Illumina; 2019
    [Google Scholar]
  61. Didelot X, Gardy J, Colijn C. Bayesian inference of infectious disease transmission from whole-genome sequence data. Mol Biol Evol 2014; 31:1869–1879 [CrossRef][PubMed]
    [Google Scholar]
  62. Spielman SJ, Wilke CO. Pyvolve: a flexible python module for simulating sequences along phylogenies. PLoS One 2015; 10:1–7 [CrossRef][PubMed]
    [Google Scholar]
  63. Felsenstein J. Journal of molecular evolution evolutionary trees from DNA sequences: a maximum likelihood approach. vol. 17. J Mol Evol 1981
    [Google Scholar]
  64. Page AJ, Taylor B, Delaney AJ, Soares J, Seemann T et al. SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments. Microb Genom 2016; 2:1–5 [CrossRef][PubMed]
    [Google Scholar]
  65. Kendall M, Colijn C. Mapping phylogenetic trees to reveal distinct patterns of evolution. Mol Biol Evol 2016; 33:2735–2743 [CrossRef][PubMed]
    [Google Scholar]
  66. Dixit A, Freschi L, Vargas R, Calderon R, Sacchettini J et al. Whole genome sequencing identifies bacterial factors affecting transmission of multidrug-resistant tuberculosis in a high-prevalence setting. Sci Rep 2019; 9:5602 [CrossRef][PubMed]
    [Google Scholar]
  67. Cohen T, Chindelevitch L, Misra R, Kempner ME, Galea J et al. Within-Host Heterogeneity of Mycobacterium tuberculosis Infection Is Associated With Poor Early Treatment Response: A Prospective Cohort Study. J Infect Dis 2016; 213:1796–1799 [CrossRef][PubMed]
    [Google Scholar]
  68. Cohen T, van Helden PD, Wilson D, Colijn C, McLaughlin MM et al. Mixed-strain Mycobacterium tuberculosis infections and the implications for tuberculosis treatment and control. Clin Microbiol Rev 2012; 25:708–719 [CrossRef][PubMed]
    [Google Scholar]
  69. Lee RS, Proulx J-F, McIntosh F, Behr MA, Hanage WP. Previously undetected superspreading of Mycobacterium tuberculosis revealed by deep sequencing. bioRxiv 2019; 801308:
    [Google Scholar]
  70. Martin MA, Lee RS, Cowley LA, Gardy JL, Hanage WP. Within-Host Mycobacterium tuberculosis diversity and its utility for inferences of transmission. Microb Genom 2018; 4: 11 10 2018 [CrossRef][PubMed]
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.000418
Loading
/content/journal/mgen/10.1099/mgen.0.000418
Loading

Data & Media loading...

Supplements

Supplementary material 1

PDF
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error