Comprehensive assessment of the quality of Salmonella whole genome sequence data available in public sequence databases using the Salmonella in silico Typing Resource (SISTR) Open Access

Abstract

Public health and food safety institutions around the world are adopting whole genome sequencing (WGS) to replace conventional methods for characterizing Salmonella for use in surveillance and outbreak response. Falling costs and increased throughput of WGS have resulted in an explosion of data, but questions remain as to the reliability and robustness of the data. Due to the critical importance of serovar information to public health, it is essential to have reliable serovar assignments available for all of the Salmonella records. The current study used a systematic assessment and curation of all Salmonella in the sequence read archive (SRA) to assess the state of the data and their utility. A total of 67 758 genomes were assembled de novo and quality-assessed for their assembly metrics as well as species and serovar assignments. A total of 42 400 genomes passed all of the quality criteria but 30.16 % of genomes were deposited without serotype information. These data were used to compare the concordance of reported and predicted serovars for two in silico prediction tools, multi-locus sequence typing (MLST) and the Salmonella in silico Typing Resource (SISTR), which produced predictions that were fully concordant with 87.51 and 91.91 % of the tested isolates, respectively. Concordance of in silico predictions increased when serovar variants were grouped together, 89.25 % for MLST and 94.98 % for SISTR. This study represents the first large-scale validation of serovar information in public genomes and provides a large validated set of genomes, which can be used to benchmark new bioinformatics tools.

Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.000151
2018-01-17
2024-03-29
Loading full text...

Full text loading...

/deliver/fulltext/mgen/4/2/mgen000151.html?itemId=/content/journal/mgen/10.1099/mgen.0.000151&mimeType=html&fmt=ahah

References

  1. WHO WHO estimates of the global burden of foodborne diseases [Internet]. Available from www.who.int/foodsafety/publications/foodborne_disease/fergreport/en/ cited 2017 Aug 22
  2. Yachison CA, Yoshida C, Robertson J, Nash JHE, Kruczkiewicz P et al. The validation and implications of using whole genome sequencing as a replacement for traditional serotyping for a National Salmonella Reference Laboratory. Front Microbiol 2017; 8:1044 [View Article][PubMed]
    [Google Scholar]
  3. Achtman M, Wain J, Weill FX, Nair S, Zhou Z et al. Multilocus sequence typing as a replacement for serotyping in Salmonella enterica . PLoS Pathog 2012; 8:e1002776 [View Article][PubMed]
    [Google Scholar]
  4. Brenner FW, Villar RG, Angulo FJ, Tauxe R, Swaminathan B. Salmonella nomenclature. J Clin Microbiol 2000; 38:2465–2467[PubMed]
    [Google Scholar]
  5. Gilmour MW, Graham M, Reimer A, van Domselaar G. Public health genomics and the new molecular epidemiology of bacterial pathogens. Public Health Genomics 2013; 16:25–30 [View Article][PubMed]
    [Google Scholar]
  6. Nadon C, Van Walle I, Gerner-Smidt P, Campos J, Chinen I et al. PulseNet International: Vision for the implementation of whole genome sequencing (WGS) for global food-borne disease surveillance. Euro Surveill 2017; 22: [View Article][PubMed]
    [Google Scholar]
  7. Allard MW. The future of whole-genome sequencing for public health and the clinic. J Clin Microbiol 2016; 54:1946–1948 [View Article][PubMed]
    [Google Scholar]
  8. Ashton PM, Nair S, Peters TM, Bale JA, Powell DG et al. Identification of Salmonella for public health surveillance using whole genome sequencing. PeerJ 2016; 4:e1752 [View Article][PubMed]
    [Google Scholar]
  9. Federhen S, Rossello-Mora R, Klenk H-P, Tindall BJ, Konstantinidis KT et al. Meeting report: GenBank microbial genomic taxonomy workshop (12–13 May, 2015). Stand Genomic Sci 2016; 11:15 [View Article]
    [Google Scholar]
  10. Fietz K, Graves JA, Olsen MT. Control control control: a reassessment and comparison of GenBank and chromatogram mtDNA sequence variation in Baltic grey seals (Halichoerus grypus). PLoS One 2013; 8:e72853 [View Article][PubMed]
    [Google Scholar]
  11. James Harris D. Can you bank on GenBank?. Trends Ecol Evol 2003; 18:317–319 [View Article]
    [Google Scholar]
  12. Korning PG, Hebsgaard SM, Rouze P, Brunak S. Cleaning the GenBank Arabidopsis thaliana data set. Nucleic Acids Res 1996; 24:316–320 [View Article][PubMed]
    [Google Scholar]
  13. Krawetz SA. Sequence errors described in GenBank: a means to determine the accuracy of DNA sequence interpretation. Nucleic Acids Res 1989; 17:3951–3957 [View Article][PubMed]
    [Google Scholar]
  14. Turenne CY, Tschetter L, Wolfe J, Kabani A. Necessity of quality-controlled 16S rRNA gene sequence databases: identifying nontuberculous Mycobacterium species. J Clin Microbiol 2001; 39:3637–3648 [View Article][PubMed]
    [Google Scholar]
  15. Vilgalys R. Taxonomic misidentification in public DNA databases. New Phytol 2003; 160:4–5 [View Article]
    [Google Scholar]
  16. Wesche PL, Gaffney DJ, Keightley PD. DNA sequence error rates in Genbank records estimated using the mouse genome as a reference. DNA Seq 2004; 15:362–364 [View Article][PubMed]
    [Google Scholar]
  17. Yoshida CE, Kruczkiewicz P, Laing CR, Lingohr EJ, Gannon VP et al. The Salmonella in silico typing resource (SISTR): an open Web-accessible tool for rapidly typing and subtyping draft Salmonella genome assemblies. PLoS One 2016; 11:e0147101 [View Article][PubMed]
    [Google Scholar]
  18. Treangen TJ, Ondov BD, Koren S, Phillippy AM. The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes. Genome Biol 2014; 15:524 [View Article][PubMed]
    [Google Scholar]
  19. Byrne L, Fisher I, Peters T, Mather A, Thomson N et al. A multi-country outbreak of Salmonella Newport gastroenteritis in Europe associated with watermelon from Brazil, confirmed by whole genome sequencing: October 2011 to January 2012. Euro Surveill 2014; 19:6–13 [View Article][PubMed]
    [Google Scholar]
  20. CDC March 2016; Multistate Outbreak of Salmonella Montevideo and Salmonella Senftenberg Infections Linked to Wonderful Pistachios | Salmonella . Available from www.cdc.gov/salmonella/montevideo-03-16/ cited 2016 Dec 29
  21. Ashelford KE, Chuzhanova NA, Fry JC, Jones AJ, Weightman AJ. At least 1 in 20 16S rRNA sequence records currently held in public repositories is estimated to contain substantial anomalies. Appl Environ Microbiol 2005; 71:7724–7736 [View Article][PubMed]
    [Google Scholar]
  22. Buhay JE. “COI-like” Sequences Are Becoming Problematic in Molecular Systematic and DNA Barcoding Studies. J Crustac Biol 2009; 29:96–110 [View Article]
    [Google Scholar]
  23. Shen YY, Chen X, Murphy RW. Assessing DNA barcoding as a tool for species identification and data quality control. PLoS One 2013; 8:e57125 [View Article][PubMed]
    [Google Scholar]
  24. Ranieri ML, Shi C, Moreno Switt AI, den Bakker HC, Wiedmann M. Comparison of typing methods with a new procedure based on sequence characterization for Salmonella serovar prediction. J Clin Microbiol 2013; 51:1786–1797 [View Article][PubMed]
    [Google Scholar]
  25. Zhang S, Yin Y, Jones MB, Zhang Z, Deatherage Kaiser BL et al. Salmonella serotype determination utilizing high-throughput genome sequencing data. J Clin Microbiol 2015; 53:1685–1692 [View Article][PubMed]
    [Google Scholar]
  26. Maiden MC, Jansen van Rensburg MJ, Bray JE, Earle SG, Ford SA et al. MLST revisited: the gene-by-gene approach to bacterial genomics. Nat Rev Microbiol 2013; 11:728–736 [View Article][PubMed]
    [Google Scholar]
  27. Sheppard SK, Jolley KA, Maiden MC. A gene-by-gene approach to bacterial population genomics: whole genome MLST of Campylobacter . Genes 2012; 3:261–277 [View Article][PubMed]
    [Google Scholar]
  28. Field D, Garrity G, Gray T, Morrison N, Selengut J et al. The minimum information about a genome sequence (MIGS) specification. Nat Biotechnol 2008; 26:541–547 [View Article][PubMed]
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.000151
Loading
/content/journal/mgen/10.1099/mgen.0.000151
Loading

Data & Media loading...

Supplements

Supplementary File 1

Supplementary File 2

Most cited Most Cited RSS feed