1887

Abstract

Whole-genome sequencing (WGS) of bacterial isolates has become standard practice in many laboratories. Applications for WGS analysis include phylogeography and molecular epidemiology, using single nucleotide polymorphisms (SNPs) as the unit of evolution. NASP was developed as a reproducible method that scales well with the hundreds to thousands of WGS data typically used in comparative genomics applications. In this study, we demonstrate how NASP compares with other tools in the analysis of two real bacterial genomics datasets and one simulated dataset. Our results demonstrate that NASP produces similar, and often better, results in comparison with other pipelines, but is much more flexible in terms of data input types, job management systems, diversity of supported tools and output formats. We also demonstrate differences in results based on the choice of the reference genome and choice of inferring phylogenies from concatenated SNPs or alignments including monomorphic positions. NASP represents a source-available, version-controlled, unit-tested method and can be obtained from tgennorth.github.io/NASP.

Keyword(s): bioinformatics , Phylogeography and SNPs
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.000074
2016-08-25
2019-08-25
Loading full text...

Full text loading...

/deliver/fulltext/mgen/2/8/mgen000074.html?itemId=/content/journal/mgen/10.1099/mgen.0.000074&mimeType=html&fmt=ahah

References

  1. Aberer A. J., Kobert K., Stamatakis A.. 2014; ExaBayes: massively parallel Bayesian tree inference for the whole-genome era. Mol Biol Evol31:2553–2556 [CrossRef][PubMed]
    [Google Scholar]
  2. Angiuoli S. V., Salzberg S. L.. 2011; Mugsy: fast multiple alignment of closely related whole genomes. Bioinformatics27:334–342 [CrossRef][PubMed]
    [Google Scholar]
  3. Bertels F., Silander O. K., Pachkov M., Rainey P. B., van Nimwegen E.. 2014; Automated reconstruction of whole-genome phylogenies from short-sequence reads. Mol Biol Evol31:1077–1088 [CrossRef][PubMed]
    [Google Scholar]
  4. Blattner F. R., Plunkett G., Bloch C. A., Perna N. T., Burland V., Riley M., Collado-Vides J., Rode C. K., Rode C. K. et al. 1997; The complete genome sequence of Escherichia coli K-12. Science277:1453–1462 [CrossRef][PubMed]
    [Google Scholar]
  5. Bolger A. M., Lohse M., Usadel B.. 2014; Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics30:2114–2120 [CrossRef][PubMed]
    [Google Scholar]
  6. Bowers J. R., Kitchel B., Driebe E. M., MacCannell D. R., Roe C., Lemmer D., de Man T., Rasheed J. K., Engelthaler D. M. et al. 2015; Genomic analysis of the emergence and rapid global dissemination of the clonal group 258 Klebsiella pneumoniae pandemic. PLoS One10:e0133727 [CrossRef][PubMed]
    [Google Scholar]
  7. Cingolani P., Platts A., Wang le L., Coon M., Nguyen T., Wang L., Land S. J., Lu X., Ruden D. M., Le W.. 2012; A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly6:80–92 [CrossRef][PubMed]
    [Google Scholar]
  8. Cui Y., Yu C., Yan Y., Li D., Li Y., Jombart T., Weinert L. A., Wang Z., Guo Z. et al. 2013; Historical variations in mutation rate in an epidemic pathogen, Yersinia pestis. Proc Natl Acad Sci U S A110:577–582 [CrossRef][PubMed]
    [Google Scholar]
  9. Delcher A. L., Salzberg S. L., Phillippy A. M.. 2003; Using MUMmer to identify similar regions in large sequence sets. Curr Protoc BioinformaticChapter 10:Unit 10.3
    [Google Scholar]
  10. den Bakker H. C., Allard M. W., Bopp D., Brown E. W., Fontana J., Iqbal Z., Kinney A., Limberger R., Musser K. A. et al. 2014; Rapid whole-genome sequencing for surveillance of Salmonella enterica serovar enteritidis. Emerg Infect Dis20:1306–1314 [CrossRef][PubMed]
    [Google Scholar]
  11. DePristo M. A., Banks E., Poplin R., Garimella K. V., Maguire J. R., Hartl C., Philippakis A. A., del Angel G., Rivas M. A. et al. 2011; A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet43:491–498 [CrossRef][PubMed]
    [Google Scholar]
  12. Dykhuizen D. E., Green L.. 1991; Recombination in Escherichia coli and the definition of biological species. J Bacteriol173:7257–7268[PubMed]
    [Google Scholar]
  13. Engelthaler D. M., Hicks N. D., Gillece J. D., Roe C. C., Schupp J. M., Driebe E. M., Gilgado F., Carriconde F., Trilles L. et al. 2014; Cryptococcus gattii in North American Pacific Northwest: whole-population genome analysis provides insights into species evolution and dispersal. MBio5:e01464-1414 [CrossRef][PubMed]
    [Google Scholar]
  14. Engelthaler D. M., Valentine M., Bowers J., Pistole J., Driebe E. M., Terriquez J., Nienstadt L., Carroll M., Schumacher M. et al. 2016; Hypervirulent emm59 clone in invasive group A Streptococcus outbreak, southwestern United States. Emerg Infect Dis22:734–738 [CrossRef][PubMed]
    [Google Scholar]
  15. Eppinger M., Mammel M. K., Leclerc J. E., Ravel J., Cebula T. A.. 2011; Genomic anatomy of Escherichia coli O157:H7 outbreaks. Proc Natl Acad Sci U S A108:20142–20147 [CrossRef][PubMed]
    [Google Scholar]
  16. Etienne K. A., Roe C. C., Smith R. M., Vallabhaneni S., Duarte C., Escadon P., Castaneda E., Gomez B. L., de Bedout C. et al. 2016; Whole-genome sequencing to determine origin of multinational outbreak of Sarocladium kiliense bloodstream infections. Emerg Infect Dis22:476–481 [CrossRef][PubMed]
    [Google Scholar]
  17. Felsenstein J.. 2005; PHYLIP (Phylogeny Inference Package) Version 3.6, 3.6 ed. University of Washington, Seattle: Department of Genome Sciences;
    [Google Scholar]
  18. Foster J. T., Beckstrom-Sternberg S. M., Pearson T., Beckstrom-Sternberg J. S., Chain P. S., Roberto F. F., Hnath J., Brettin T., Keim P.. 2009; Whole-genome-based phylogeny and divergence of the genus Brucella. J Bacteriol191:2864–2870 [CrossRef][PubMed]
    [Google Scholar]
  19. Gardner S. N., Slezak T.. 2010; Scalable SNP analyses of 100+ bacterial or viral genomes. J Forensic Res01:107 [CrossRef]
    [Google Scholar]
  20. Gardner S. N., Hall B. G.. 2013; When whole-genome alignments just won't work: kSNP v2 software for alignment-free SNP discovery and phylogenetics of hundreds of microbial genomes. PLoS One8:e81760 [CrossRef][PubMed]
    [Google Scholar]
  21. Gardner S. N., Slezak T., Hall B. G.. 2015; kSNP3.0: SNP detection and phylogenetic analysis of genomes without genome alignment or reference genome. Bioinformatics31:2877–2878 [CrossRef][PubMed]
    [Google Scholar]
  22. Hsu L. Y., Harris S. R., Chlebowicz M. A., Lindsay J. A., Koh T. H., Krishnan P., Tan T. Y., Hon P. Y., Grubb W. B. et al. 2015; Evolutionary dynamics of methicillin-resistant Staphylococcus aureus within a healthcare system. Genome Biol16:81 [CrossRef][PubMed]
    [Google Scholar]
  23. Huang W., Li L., Myers J. R., Marth G. T.. 2012; ART: a next-generation sequencing read simulator. Bioinformatics28:593–594 [CrossRef][PubMed]
    [Google Scholar]
  24. Katz L. S., Petkau A., Beaulaurier J., Tyler S., Antonova E. S., Turnsek M. A., Guo Y., Wang S., Paxinos E. E. et al. 2013; Evolutionary dynamics of Vibrio cholerae O1 following a single-source introduction to Haiti. MBio4:e00398-13 [CrossRef][PubMed]
    [Google Scholar]
  25. Keim P. S., Wagner D. M.. 2009; Humans and evolutionary and ecological forces shaped the phylogeography of recently emerged diseases. Nat Rev Microbiol7:813–821 [CrossRef][PubMed]
    [Google Scholar]
  26. Koboldt D. C., Zhang Q., Larson D. E., Shen D., McLellan M. D., Lin L., Miller C. A., Mardis E. R., Ding L., Wilson R. K.. 2012; VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res22:568–576 [CrossRef][PubMed]
    [Google Scholar]
  27. Langmead B., Salzberg S. L.. 2012; Fast gapped-read alignment with Bowtie 2. Nat Methods9:357–359 [CrossRef][PubMed]
    [Google Scholar]
  28. Leaché A. D., Banbury B. L., Felsenstein J., de Oca A. N., Stamatakis A.. 2015; Short tree, long tree, right tree, wrong tree: new acquisition bias corrections for inferring SNP phylogenies. Syst Biol64:1032–1047 [CrossRef][PubMed]
    [Google Scholar]
  29. Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R.. 1000 Genome Project Data Processing Subgroup 2009; The Sequence Alignment/Map format and SAMtools. Bioinformatics25:2078–2079 [CrossRef][PubMed]
    [Google Scholar]
  30. Li H.. 2013; Aligning sequence reads, clone sequences and assembly contigs with Bwa-Mem. arXiv.org:1303.3997 [Q-bio.Gn]
    [Google Scholar]
  31. McKenna A., Hanna M., Banks E., Sivachenko A., Cibulskis K., Kernytsky A., Garimella K., Altshuler D., Gabriel S. et al. 2010; The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res20:1297–1303 [CrossRef][PubMed]
    [Google Scholar]
  32. Nye T. M., Liò P., Gilks W. R.. 2006; A novel algorithm and web-based tool for comparing two alternative phylogenetic trees. Bioinformatics22:117–119 [CrossRef][PubMed]
    [Google Scholar]
  33. Olson N. D., Lund S. P., Colman R. E., Foster J. T., Sahl J. W., Schupp J. M., Keim P., Morrow J. B., Salit M. L., Zook J. M.. 2015; Best practices for evaluating single nucleotide variant calling methods for microbial genomics. Front Genet6:235 [CrossRef][PubMed]
    [Google Scholar]
  34. Parkhill J., Wren B. W., Thomson N. R., Titball R. W., Holden M. T., Prentice M. B., Sebaihia M., James K. D., Churcher C. et al. 2001; Genome sequence of Yersinia pestis, the causative agent of plague. Nature413:523–527 [CrossRef][PubMed]
    [Google Scholar]
  35. Pettengill J. B., Luo Y., Davis S., Chen Y., Gonzalez-Escalona N., Ottesen A., Rand H., Allard M. W., Strain E.. 2014; An evaluation of alternative methods for constructing phylogenies from whole genome sequence data: a case study with Salmonella. PeerJ2:e620 [CrossRef][PubMed]
    [Google Scholar]
  36. Price M. N., Dehal P. S., Arkin A. P.. 2010; FastTree 2–approximately maximum-likelihood trees for large alignments. PLoS One5:e9490 [CrossRef][PubMed]
    [Google Scholar]
  37. Rasko D. A., Webster D. R., Sahl J. W., Bashir A., Boisen N., Scheutz F., Paxinos E. E., Sebra R., Chin C. S. et al. 2011; Origins of the E. coli strain causing an outbreak of hemolytic-uremic syndrome in Germany. N Engl J Med365:709–717 [CrossRef][PubMed]
    [Google Scholar]
  38. Rentería M. E., Cortes A., Medland S. E.. 2013; Using PLINK for Genome-Wide Association Studies (GWAS) and data analysis. Methods Mol Biol1019:193–213 [CrossRef][PubMed]
    [Google Scholar]
  39. Sahl J. W., Steinsland H., Redman J. C., Angiuoli S. V., Nataro J. P., Sommerfelt H., Rasko D. A.. 2011; A comparative genomic analysis of diverse clonal types of enterotoxigenic Escherichia coli reveals pathovar-specific conservation. Infect Immun79:950–960 [CrossRef][PubMed]
    [Google Scholar]
  40. Sahl J. W., Beckstrom-Sternberg S. M., Babic-Sternberg J., Gillece J. D., Hepp C. M., Auerbach R. K., Tembe W., Wagner D. M., Keim P. S., Pearson T.. 2015a; The In Silico Genotyper (ISG): an open-source pipeline to rapidly identify and annotate nucleotide variants for comparative genomics applications. bioRxiv015578:
    [Google Scholar]
  41. Sahl J. W., Morris C. R., Emberger J., Fraser C. M., Ochieng J. B., Juma J., Fields B., Breiman R. F., Gilmour M. et al. 2015b; Defining the phylogenomics of Shigella species: a pathway to diagnostics. J Clin Microbiol53:951–960 [CrossRef]
    [Google Scholar]
  42. Sahl J. W., Schupp J. M., Rasko D. A., Colman R. E., Foster J. T., Keim P.. 2015c; Phylogenetically typing bacterial strains from partial SNP genotypes observed from direct sequencing of clinical specimen metagenomic data. Genome Medicine7:52 [CrossRef]
    [Google Scholar]
  43. Sahl J. W., Sistrunk J. R., Fraser C. M., Hine E., Baby N., Begum Y., Luo Q., Sheikh A., Qadri F. et al. 2015d; Examination of the enterotoxigenic Escherichia coli population structure during human infection. mBio6:e00501-15 [CrossRef]
    [Google Scholar]
  44. Sarovich D. S., Price E. P.. 2014; SPANDx: a genomics pipeline for comparative analysis of large haploid whole genome re-sequencing datasets. BMC Res Notes7:618 [CrossRef][PubMed]
    [Google Scholar]
  45. Stamatakis A.. 2014; RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics30:1312–1313 [CrossRef][PubMed]
    [Google Scholar]
  46. Touchon M., Hoede C., Tenaillon O., Barbe V., Baeriswyl S., Bidet P., Bingen E., Bonacorsi S., Bouchier C. et al. 2009; Organised genome dynamics in the Escherichia coli species results in highly diverse adaptive paths. PLoS Genet5:e1000344 [CrossRef][PubMed]
    [Google Scholar]
  47. Treangen T. J., Ondov B. D., Koren S., Phillippy A. M.. 2014; The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes. Genome Biol15:524 [CrossRef][PubMed]
    [Google Scholar]
  48. Zaharia M., Bolosky W. J., Curtis K., Fox A., Patterson D., Shenker S., Stoica I., Karp R. M., Sittler T.. 2011; Faster and more accurate sequence alignment with Snap. arXiv.org: arXiv.1111.5572 [Cs.Ds]
    [Google Scholar]
  49. Cui, Y. Sequence Read Archive. SRA010790 (2013)
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.000074
Loading
/content/journal/mgen/10.1099/mgen.0.000074
Loading

Data & Media loading...

Supplementary File 1

PDF

Most Cited This Month

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error