1887

Abstract

The recent widespread application of whole-genome sequencing (WGS) for microbial disease investigations has spurred the development of new bioinformatics tools, including a notable proliferation of phylogenomics pipelines designed for infectious disease surveillance and outbreak investigation. Transitioning the use of WGS data out of the research laboratory and into the front lines of surveillance and outbreak response requires user-friendly, reproducible and scalable pipelines that have been well validated. Single Nucleotide Variant Phylogenomics (SNVPhyl) is a bioinformatics pipeline for identifying high-quality single-nucleotide variants (SNVs) and constructing a whole-genome phylogeny from a collection of WGS reads and a reference genome. Individual pipeline components are integrated into the Galaxy bioinformatics framework, enabling data analysis in a user-friendly, reproducible and scalable environment. We show that SNVPhyl can detect SNVs with high sensitivity and specificity, and identify and remove regions of high SNV density (indicative of recombination). SNVPhyl is able to correctly distinguish outbreak from non-outbreak isolates across a range of variant-calling settings, sequencing-coverage thresholds or in the presence of contamination. SNVPhyl is available as a Galaxy workflow, Docker and virtual machine images, and a Unix-based command-line application. SNVPhyl is released under the Apache 2.0 license and available at http://snvphyl.readthedocs.io/ or at https://github.com/phac-nml/snvphyl-galaxy.

Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.000116
2017-06-08
2019-08-24
Loading full text...

Full text loading...

/deliver/fulltext/mgen/3/6/mgen000116.html?itemId=/content/journal/mgen/10.1099/mgen.0.000116&mimeType=html&fmt=ahah

References

  1. Hendriksen RS, Price LB, Schupp JM, Gillece JD, Kaas RS et al. Population genetics of Vibrio cholerae from Nepal in 2010: evidence on the origin of the Haitian outbreak. MBio 2011;2:e00157-11 [CrossRef][PubMed]
    [Google Scholar]
  2. Katz LS, Petkau A, Beaulaurier J, Tyler S, Antonova ES et al. Evolutionary dynamics of Vibrio cholerae O1 following a single-source introduction to Haiti. MBio 2013;4:e00398-13 [CrossRef][PubMed]
    [Google Scholar]
  3. Frerichs RR, Keim PS, Barrais R, Piarroux R. Nepalese origin of cholera epidemic in Haiti. Clin Microbiol Infect 2012;18:E158E163 [CrossRef][PubMed]
    [Google Scholar]
  4. Gardy JL, Johnston JC, Ho Sui SJ, Cook VJ, Shah L et al. Whole-genome sequencing and social-network analysis of a tuberculosis outbreak. N Engl J Med 2011;364:730–739 [CrossRef][PubMed]
    [Google Scholar]
  5. Roetzer A, Diel R, Kohl TA, Rückert C, Nübel U et al. Whole genome sequencing versus traditional genotyping for investigation of a Mycobacterium tuberculosis outbreak: a longitudinal molecular epidemiological study. PLoS Med 2013;10:e1001387 [CrossRef][PubMed]
    [Google Scholar]
  6. Holmes A, Allison L, Ward M, Dallman TJ, Clark R et al. Utility of whole-genome sequencing of Escherichia coli O157 for outbreak detection and epidemiological surveillance. J Clin Microbiol 2015;53:3565–3573 [CrossRef][PubMed]
    [Google Scholar]
  7. Sánchez-Busó L, Comas I, Jorques G, González-Candelas F. Recombination drives genome evolution in outbreak-related Legionella pneumophila isolates. Nat Genet 2014;46:1205–1211 [CrossRef][PubMed]
    [Google Scholar]
  8. Allard MW, Strain E, Melka D, Bunning K, Musser SM et al. Practical value of food pathogen traceability through building a whole-genome sequencing network and database. J Clin Microbiol 2016;54:1975–1983 [CrossRef][PubMed]
    [Google Scholar]
  9. Franz E, Gras LM, Dallman T. Significance of whole genome sequencing for surveillance, source attribution and microbial risk assessment of foodborne pathogens. Curr Opin Food Sci 2016;8:74–79 [CrossRef]
    [Google Scholar]
  10. Ashton PM, Nair S, Peters TM, Bale JA, Powell DG et al. Identification of Salmonella for public health surveillance using whole genome sequencing. PeerJ 2016;4:e1752 [CrossRef][PubMed]
    [Google Scholar]
  11. Maiden MC, Jansen van Rensburg MJ, Bray JE, Earle SG, Ford SA et al. MLST revisited: the gene-by-gene approach to bacterial genomics. Nat Rev Microbiol 2013;11:728–736 [CrossRef][PubMed]
    [Google Scholar]
  12. Moura A, Criscuolo A, Pouseele H, Maury MM, Leclercq A et al. Whole genome-based population biology and epidemiological surveillance of Listeria monocytogenes. Nat Microbiol 2016;2:16185 [CrossRef][PubMed]
    [Google Scholar]
  13. Kwong JC, Mercoulia K, Tomita T, Easton M, Li HY et al. Prospective whole-genome sequencing enhances national surveillance of Listeria monocytogenes. J Clin Microbiol 2016;54:333–342 [CrossRef][PubMed]
    [Google Scholar]
  14. Bertels F, Silander OK, Pachkov M, Rainey PB, van Nimwegen E. Automated reconstruction of whole-genome phylogenies from short-sequence reads. Mol Biol Evol 2014;31:1077–1088 [CrossRef][PubMed]
    [Google Scholar]
  15. Jackson BR, Tarr C, Strain E, Jackson KA, Conrad A et al. Implementation of nationwide real-time whole-genome sequencing to enhance listeriosis outbreak detection and investigation. Clin Infect Dis 2016;63:380–386 [CrossRef][PubMed]
    [Google Scholar]
  16. Kaas RS, Leekitcharoenphon P, Aarestrup FM, Lund O. Solving the problem of comparing whole bacterial genomes across different sequencing platforms. PLoS One 2014;9:e104984 [CrossRef][PubMed]
    [Google Scholar]
  17. Davis S, Pettengill JB, Luo Y, Payne J, Shpuntoff A et al. CFSAN SNP Pipeline: an automated method for constructing SNP matrices from next-generation sequence data. PeerJ Comput Sci 2015;1:e20 [CrossRef]
    [Google Scholar]
  18. Croucher NJ, Page AJ, Connor TR, Delaney AJ, Keane JA et al. Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. Nucleic Acids Res 2015;43:e15 [CrossRef][PubMed]
    [Google Scholar]
  19. Didelot X, Wilson DJ. ClonalFrameML: efficient inference of recombination in whole bacterial genomes. PLoS Comput Biol 2015;11:e1004041 [CrossRef][PubMed]
    [Google Scholar]
  20. Sahl JW, Lemmer D, Travis J, Schupp JM, Gillece JD et al. NASP: an accurate, rapid method for the identification of SNPs in WGS datasets that supports flexible input and output formats. Microb Genom 2016;2:e000074 [CrossRef][PubMed]
    [Google Scholar]
  21. Katz LS, Griswold T, Williams-Newkirk AJ, Wagner D, Petkau A et al. A comparative analysis of the Lyve-SET phylogenomics pipeline for genomic epidemiology of foodborne pathogens. Front Microbiol 2017;8:375 [CrossRef][PubMed]
    [Google Scholar]
  22. Afgan E, Baker D, van den Beek M, Blankenberg D, Bouvier D et al. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Res 2016;44:W3–W10 [CrossRef][PubMed]
    [Google Scholar]
  23. Afgan E, Sloggett C, Goonasekera N, Makunin I, Benson D et al. Genomics virtual laboratory: a practical bioinformatics workbench for the cloud. PLoS One 2015;10:e0140829 [CrossRef][PubMed]
    [Google Scholar]
  24. Blankenberg D, Von Kuster G, Bouvier E, Baker D, Afgan E et al. Dissemination of scientific software with Galaxy ToolShed. Genome Biol 2014;15:403 [CrossRef][PubMed]
    [Google Scholar]
  25. Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M et al. Versatile and open software for comparing large genomes. Genome Biol 2004;5:R12 [CrossRef][PubMed]
    [Google Scholar]
  26. Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. arXiv 2012;;arXiv:1207.3907
    [Google Scholar]
  27. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J et al. The sequence alignment/map format and SAMtools. Bioinformatics 2009;25:2078–2079 [CrossRef][PubMed]
    [Google Scholar]
  28. Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 2011;27:2987–2993 [CrossRef][PubMed]
    [Google Scholar]
  29. Guindon S, Gascuel O. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 2003;52:696–704 [CrossRef][PubMed]
    [Google Scholar]
  30. Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 2010;59:307–321 [CrossRef][PubMed]
    [Google Scholar]
  31. Huang W, Li L, Myers JR, Marth GT. ART: a next-generation sequencing read simulator. Bioinformatics 2012;28:593–594 [CrossRef][PubMed]
    [Google Scholar]
  32. Croucher NJ, Harris SR, Fraser C, Quail MA, Burton J et al. Rapid pneumococcal evolution in response to clinical interventions. Science 2011;331:430–434 [CrossRef][PubMed]
    [Google Scholar]
  33. Soria-Carrasco V, Talavera G, Igea J, Castresana J. The K tree score: quantification of differences in the relative branch length and topology of phylogenetic trees. Bioinformatics 2007;23:2954–2956 [CrossRef][PubMed]
    [Google Scholar]
  34. Revell LJ. Phytools: an R package for phylogenetic comparative biology (and other things). Methods Ecol Evol 2012;3:217–223 [CrossRef]
    [Google Scholar]
  35. Zhu Y, Stephens RM, Meltzer PS, Davis SR. SRAdb: query and use public next-generation sequencing data from within R. BMC Bioinformatics 2013;14:19 [CrossRef][PubMed]
    [Google Scholar]
  36. Bekal S, Berry C, Reimer AR, van Domselaar G, Beaudry G et al. Usefulness of high-quality core genome single-nucleotide variant analysis for subtyping the highly clonal and the most prevalent Salmonella enterica serovar Heidelberg clone in the context of outbreak investigations. J Clin Microbiol 2016;54:289–295 [CrossRef][PubMed]
    [Google Scholar]
  37. Paradis E, Claude J, Strimmer K. APE: analyses of phylogenetics and evolution in R language. Bioinformatics 2004;20:289–290 [CrossRef][PubMed]
    [Google Scholar]
  38. Koren S, Treangen TJ, Hill CM, Pop M, Phillippy AM. Automated ensemble assembly and validation of microbial genomes. BMC Bioinformatics 2014;15:126 [CrossRef][PubMed]
    [Google Scholar]
  39. Lynch T, Petkau A, Knox N, Graham M, van Domselaar G. A primer on infectious disease bacterial genomics. Clin Microbiol Rev 2016;29:881–913 [CrossRef]
    [Google Scholar]
  40. Olson ND, Lund SP, Colman RE, Foster JT, Sahl JW et al. Best practices for evaluating single nucleotide variant calling methods for microbial genomics. Front Genet 2015;6:235 [CrossRef][PubMed]
    [Google Scholar]
  41. Croucher NJ, Harris SR, Grad YH, Hanage WP. Bacterial genomes in epidemiology—present and future. Philos Trans R Soc Lond B Biol Sci 2013;368:20120202 [CrossRef][PubMed]
    [Google Scholar]
  42. Marttinen P, Hanage WP, Croucher NJ, Connor TR, Harris SR et al. Detection of recombination events in bacterial genomes from large population samples. Nucleic Acids Res 2012;40:e6 [CrossRef][PubMed]
    [Google Scholar]
  43. Wood DE, Salzberg SL. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol 2014;15:R46 [CrossRef][PubMed]
    [Google Scholar]
  44. Gardner SN, Slezak T, Hall BG. kSNP3.0: SNP detection and phylogenetic analysis of genomes without genome alignment or reference genome. Bioinformatics 2015;31:2877–2878 [CrossRef][PubMed]
    [Google Scholar]
  45. Treangen TJ, Ondov BD, Koren S, Phillippy AM. The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes. Genome Biol 2014;15:524 [CrossRef][PubMed]
    [Google Scholar]
  46. Ahmed SA, Lo C, Li P, Davenport KW, Chain PSG et al. From raw reads to trees: whole genome SNP phylogenetics across the tree of life. bioRxiv 2015; doi:10.1101/032250
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.000116
Loading
/content/journal/mgen/10.1099/mgen.0.000116
Loading

Data & Media loading...

Supplementary File 1

PDF

Supplementary File 2

Supplementary File 3

Supplementary File 4

Most Cited This Month

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error