1887

Abstract

A large European multi-country serovar Enteritidis outbreak associated with Polish eggs was characterized by whole-genome sequencing (WGS)-based analysis, with various European institutes using different analysis workflows to identify isolates potentially related to the outbreak. The objective of our study was to compare the output of six of these different typing workflows (distance matrices of either SNP-based or allele-based workflows) in terms of cluster detection and concordance. To this end, we analysed a set of 180 isolates coming from confirmed and probable outbreak cases, which were representative of the genetic variation within the outbreak, supplemented with 22 unrelated contemporaneous . serovar Enteritidis isolates. Since the definition of a cluster cut-off based on genetic distance requires prior knowledge on the evolutionary processes that govern the bacterial populations in question, we used a variety of hierarchical clustering methods (single, average and complete) and selected the optimal number of clusters based on the consensus of the silhouette, Dunn2, and McClain–Rao internal validation indices. External validation was done by calculating the concordance with the WGS-based case definition (SNP-address) for this outbreak using the Fowlkes–Mallows index. Our analysis indicates that with complete-linkage hierarchical clustering combined with the optimal number of clusters, as defined by three internal validity indices, the six different allele- and SNP-based typing workflows generate clusters with similar compositions. Furthermore, we show that even in the absence of coordinated typing procedures, but by using an unsupervised machine learning methodology for cluster delineation, the various workflows that are currently in use by six European public-health authorities can identify concordant clusters of genetically related . enterica serovar Enteritidis isolates; thus, providing public-health researchers with comparable tools for detection of infectious-disease outbreaks.

Funding
This study was supported by the:
  • Dutch Ministry of Health, Welfare and Sports
    • Principle Award Recipient: Not Applicable
  • This is an open-access article distributed under the terms of the Creative Commons Attribution NonCommercial License.
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.000318
2020-02-26
2024-09-14
Loading full text...

Full text loading...

/deliver/fulltext/mgen/6/3/mgen000318.html?itemId=/content/journal/mgen/10.1099/mgen.0.000318&mimeType=html&fmt=ahah

References

  1. EFSA ECDC The European Union summary report on trends and sources of zoonoses, zoonotic agents and food‐borne outbreaks in 2016. EFSA J 2017; 15:e05077
    [Google Scholar]
  2. EFSA ECDC The European Union summary report on trends and sources of zoonoses, zoonotic agents and food-borne outbreaks in 2017. EFSA J 2018; 16:e05500
    [Google Scholar]
  3. Sokal RR, Sneath PHA. Principles of Numerical Taxonomy San Francisco, CA and London: W. H. Freeman; 1963
    [Google Scholar]
  4. Dallman T, Ashton P, Schafer U, Jironkin A, Painset A et al. SnapperDB: a database solution for routine sequencing analysis of bacterial isolates. Bioinformatics 2018; 34:3028–3029 [View Article]
    [Google Scholar]
  5. Katz LS, Griswold T, Williams-Newkirk AJ, Wagner D, Petkau A et al. A comparative analysis of the Lyve-SET phylogenomics pipeline for genomic epidemiology of foodborne pathogens. Front Microbiol 2017; 8:375 [View Article]
    [Google Scholar]
  6. Moura A, Criscuolo A, Pouseele H, Maury MM, Leclercq A et al. Whole genome-based population biology and epidemiological surveillance of Listeria monocytogenes . Nat Microbiol 2017; 2:16185 [View Article]
    [Google Scholar]
  7. Pearce ME, Alikhan N-F, Dallman TJ, Zhou Z, Grant K et al. Comparative analysis of core genome MLST and SNP typing within a European Salmonella serovar Enteritidis outbreak. Int J Food Microbiol 2018; 274:1–11 [View Article]
    [Google Scholar]
  8. Pijnacker R, Dallman TJ, Tijsma ASL, Hawkins G, Larkin L et al. An international outbreak of Salmonella enterica serotype Enteritidis linked to eggs from Poland: a microbiological and epidemiological study. Lancet Infect Dis 2019; 19:778–786 [View Article]
    [Google Scholar]
  9. Leekitcharoenphon P, Nielsen EM, Kaas RS, Lund O, Aarestrup FM. Evaluation of whole genome sequencing for outbreak detection of Salmonella enterica . PLoS One 2014; 9:e87991 [View Article]
    [Google Scholar]
  10. Felsenstein J. Inferring Phylogenies Sunderland, MA: Sinauer; 2003
    [Google Scholar]
  11. Kalinowski ST. How well do evolutionary trees describe genetic relationships among populations?. Heredity 2009; 102:506–513 [View Article]
    [Google Scholar]
  12. Fitch WM. Toward defining the course of evolution: minimum change for a specific tree topology. Syst Zool 1971; 20:406–416 [View Article]
    [Google Scholar]
  13. Felsenstein J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 1981; 17:368–376 [View Article]
    [Google Scholar]
  14. Rannala B, Yang Z. Probability distribution of molecular evolutionary trees: a new method of phylogenetic inference. J Mol Evol 1996; 43:304–311 [View Article]
    [Google Scholar]
  15. Zhou Z, Alikhan N-F, Sergeant MJ, Luhmann N, Vaz C et al. GrapeTree: visualization of core genomic relationships among 100,000 bacterial pathogens. Genome Res 2018; 28:1395–1404 [View Article]
    [Google Scholar]
  16. Kluytmans-van den Bergh MFQ, Rossen JWA, Bruijning-Verhagen PCJ, Bonten MJM, Friedrich AW et al. Whole-genome multilocus sequence typing of extended-spectrum-beta-lactamase-producing Enterobacteriaceae . J Clin Microbiol 2016; 54:2919–2927 [View Article]
    [Google Scholar]
  17. Ashton P, Nair S, Peters T, Tewolde R, Day M et al. Revolutionising public health reference microbiology using whole genome sequencing: Salmonella as an exemplar. bioRxiv 2015 [View Article]
    [Google Scholar]
  18. Saltykova A, Wuyts V, Mattheus W, Bertrand S, Roosens NHC et al. Comparison of SNP-based subtyping workflows for bacterial isolates using WGS data, applied to Salmonella enterica serotype Typhimurium and serotype 1,4,[5],12:i:-. PLoS One 2018; 13:e0192504 [View Article]
    [Google Scholar]
  19. Dallman TJ, Byrne L, Ashton PM, Cowley LA, Perry NT et al. Whole-genome sequencing for national surveillance of Shiga toxin-producing Escherichia coli O157. Clin Infect Dis 2015; 61:305–312 [View Article]
    [Google Scholar]
  20. Kwong JC, Mercoulia K, Tomita T, Easton M, Li HY et al. Prospective whole-genome sequencing enhances national surveillance of Listeria monocytogenes . J Clin Microbiol 2016; 54:333–342 [View Article]
    [Google Scholar]
  21. Franz E, Gras LM, Dallman T. Significance of whole genome sequencing for surveillance, source attribution and microbial risk assessment of foodborne pathogens. Curr Opin Food Sci 2016; 8:74–79 [View Article]
    [Google Scholar]
  22. ECDC Expert Opinion on Whole Genome Sequencing for Public Health Surveillance Stockholm: European Centre for Disease Prevention and Control; 2016
    [Google Scholar]
  23. WHO Whole Genome Sequencing for Foodborne Disease Surveillance: Landscape Paper Geneva: World Health Organization; 2018
    [Google Scholar]
  24. Maiden MCJ, Jansen van Rensburg MJ, Bray JE, Earle SG, Ford SA et al. MLST revisited: the gene-by-gene approach to bacterial genomics. Nat Rev Microbiol 2013; 11:728–736 [View Article]
    [Google Scholar]
  25. Alikhan N-F, Zhou Z, Sergeant MJ, Achtman M. A genomic overview of the population structure of Salmonella . PLoS Genet 2018; 14:e1007261 [View Article]
    [Google Scholar]
  26. Pajuste F-D, Kaplinski L, Möls M, Puurand T, Lepamets M et al. FastGT: an alignment-free method for calling common SNVs directly from raw sequencing reads. Sci Rep 2017; 7:2537 [View Article]
    [Google Scholar]
  27. Standage DS, Brown CT, Hormozdiari F. Kevlar: a mapping-free framework for accurate discovery of de novo variants. iScience 2019; 18:28–36 [View Article]
    [Google Scholar]
  28. Leekitcharoenphon P, Lukjancenko O, Friis C, Aarestrup FM, Ussery DW. Genomic variation in Salmonella enterica core genes for epidemiological typing. BMC Genomics 2012; 13:88 [View Article]
    [Google Scholar]
  29. Ashton PM, Baker KS, Gentle A, Wooldridge DJ, Thomson NR et al. Draft genome sequences of the type strains of Shigella flexneri held at Public Health England: comparison of classical phenotypic and novel molecular assays with whole genome sequence. Gut Pathog 2014; 6:7 [View Article]
    [Google Scholar]
  30. Inns T, Ashton PM, Herrera-Leon S, Lighthill J, Foulkes S et al. Prospective use of whole genome sequencing (WGS) detected a multi-country outbreak of Salmonella Enteritidis. Epidemiol Infect 2017; 145:289–298 [View Article]
    [Google Scholar]
  31. Mair-Jenkins J, Borges-Stewart R, Harbour C, Cox-Rogers J, Dallman T et al. Investigation using whole genome sequencing of a prolonged restaurant outbreak of Salmonella Typhimurium linked to the building drainage system, England, February 2015 to March 2016. Euro Surveill 2017; 22:17-00037 [View Article]
    [Google Scholar]
  32. Mellmann A, Harmsen D, Cummings CA, Zentz EB, Leopold SR et al. Prospective genomic characterization of the German enterohemorrhagic Escherichia coli O104:H4 outbreak by rapid next generation sequencing technology. PLoS One 2011; 6:e22751 [View Article]
    [Google Scholar]
  33. Chen Y, Luo Y, Carleton H, Timme R, Melka D et al. Whole genome and core genome multilocus sequence typing and single nucleotide polymorphism analyses of Listeria monocytogenes isolates associated with an outbreak linked to cheese, United States, 2013. Appl Environ Microbiol 2017; 83:e00633-17 [View Article]
    [Google Scholar]
  34. Schmid D, Allerberger F, Huhulescu S, Pietzka A, Amar C et al. Whole genome sequencing as a tool to investigate a cluster of seven cases of listeriosis in Austria and Germany, 2011-2013. Clin Microbiol Infect 2014; 20:431–436 [View Article]
    [Google Scholar]
  35. Brandwagt D, van den Wijngaard C, Tulen AD, Mulder AC, Hofhuis A et al. Outbreak of Salmonella Bovismorbificans associated with the consumption of uncooked ham products, the Netherlands, 2016 to 2017. Euro Surveill 2018; 23:17-00335 [View Article]
    [Google Scholar]
  36. Revez J, Espinosa L, Albiger B, Leitmeyer KC, Struelens MJ et al. Survey on the use of whole-genome sequencing for infectious diseases surveillance: rapid expansion of European national capacities, 2015-2016. Front Public Health 2017; 5:347 [View Article]
    [Google Scholar]
  37. EFSA ECDC Multi-country Outbreak of Salmonella Enteritidis Infections Linked to Polish Eggs Stockholm and Parma: European Food Safety Authority, European Centre for Disease Prevention and Control; 2017
    [Google Scholar]
  38. Dallman TJ, Crook PD, Godbole G, Mook P, Chattaway MA et al. Use of whole-genome sequencing for the public health surveillance of Shigella sonnei in England and Wales, 2015. J Med Microbiol 2016; 65:882–884 [View Article]
    [Google Scholar]
  39. Hamming RW. Error detecting and error correcting codes. Bell Syst Technic J 1950; 29:147–160 [View Article]
    [Google Scholar]
  40. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009; 25:1754–1760 [View Article]
    [Google Scholar]
  41. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009; 25:2078–2079 [View Article]
    [Google Scholar]
  42. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 2010; 20:1297–1303 [View Article]
    [Google Scholar]
  43. van den Berg RR, Dissel S, Rapallini MLBA, van der Weijden CC, Wit B et al. Characterization and whole genome sequencing of closely related multidrug-resistant Salmonella enterica serovar Heidelberg isolates from imported poultry meat in the Netherlands. PLoS One 2019; 14:e0219795 [View Article]
    [Google Scholar]
  44. Wood DE, Salzberg SL. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol 2014; 15:R46 [View Article]
    [Google Scholar]
  45. Ridom . Ridom SeqSphere+ User Guide; 2019. https://www.ridom.de/u/User_Guide.html
  46. Sneath PHA. The application of computers to taxonomy. Microbiology 1957; 17:201–226 [View Article]
    [Google Scholar]
  47. Carlsson G, Memoli F. Characterization, stability and convergence of hierarchical clustering methods. J Mach Learn Res 2010; 11:1425–1470
    [Google Scholar]
  48. Sørensen T. A method of establishing groups of equal amplitude in plant sociology based on similarity of species and its application to analyses of the vegetation on Danish commons. Biol Skr 1948; 5:1–34
    [Google Scholar]
  49. Henri C, Leekitcharoenphon P, Carleton HA, Radomski N, Kaas RS et al. An assessment of different genomic approaches for inferring phylogeny of Listeria monocytogenes . Front Microbiol 2017; 8:2351 [View Article]
    [Google Scholar]
  50. Diedenhofen B, Musch J. cocor: a comprehensive solution for the statistical comparison of correlations. PLoS One 2015; 10:e0121945 [View Article]
    [Google Scholar]
  51. McLauchlin J, Aird H, Andrews N, Chattaway M, de Pinna E et al. Public health risks associated with Salmonella contamination of imported edible betel leaves: analysis of results from England, 2011–2017. Int J Food Microbiol 2019; 298:1–10 [View Article]
    [Google Scholar]
  52. Rousseeuw PJ. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 1987; 20:53–65 [View Article]
    [Google Scholar]
  53. McClain JO, Rao VR. CLUSTISZ: a program to test for the quality of clustering of a set of objects. J Mark Res 1975; 12:456–460
    [Google Scholar]
  54. Halkidi M, Batistakis Y, Vazirgiannis M. On clustering validation techniques. J Intell Inf Syst 2001; 17:107–145 [View Article]
    [Google Scholar]
  55. Charrad M, Ghazzali N, Boiteau V, Niknafs A. NbClust : an R package for determining the relevant number of clusters in a data set. J Stat Softw 2014; 61:6 [View Article]
    [Google Scholar]
  56. Hennig C. . fpc: Flexible Procedures for Clustering; 2018. https://CRAN.R-project.org/package=fpc
  57. Pihur V, Datta S, Datta S. RankAggreg, an R package for weighted RANK aggregation. BMC Bioinformatics 2009; 10:62 [View Article]
    [Google Scholar]
  58. Galili T. dendextend: an R package for visualizing, adjusting and comparing trees of hierarchical clustering. Bioinformatics 2015; 31:3718–3720 [View Article]
    [Google Scholar]
  59. Baker FB. Stability of two hierarchical grouping techniques case 1: sensitivity to data errors. J Am Stat Assoc 1974; 69:440–445
    [Google Scholar]
  60. Fowlkes EB, Mallows CL. A method for comparing two hierarchical clusterings. J Am Stat Assoc 1983; 78:553–569 [View Article]
    [Google Scholar]
  61. Shotwell MS. profdpm: an R package for MAP estimation in a class of conjugate product partition models. J Stat Softw 2013; 53:8 [View Article]
    [Google Scholar]
  62. Bojanowski M, Edwards R. alluvial: R Package for Creating Alluvial Diagrams 2016
  63. R Core Team R: a Language and Environment for Statistical Computing Vienna: R Foundation for Statistical Computing; 2017
    [Google Scholar]
  64. Achtman M, Wain J, Weill F-X, Nair S, Zhou Z et al. Multilocus sequence typing as a replacement for serotyping in Salmonella enterica . PLoS Pathog 2012; 8:e1002776 [View Article]
    [Google Scholar]
  65. Sokal RR, Michener CD. A statistical method for evaluating systematic relationships. Univ Kansas Sci Bull 1958; 28:1409–1438
    [Google Scholar]
  66. Waldram A, Dolan G, Ashton PM, Jenkins C, Dallman TJ. Epidemiological analysis of Salmonella clusters identified by whole genome sequencing, England and Wales 2014. Food Microbiol 2018; 71:39–45 [View Article] [View Article]
    [Google Scholar]
  67. Sobkowiak B, Glynn JR, Houben RMGJ, Mallard K, Phelan JE et al. Identifying mixed Mycobacterium tuberculosis infections from whole genome sequence data. BMC Genomics 2018; 19:613 [View Article]
    [Google Scholar]
  68. Low AJ, Koziol AG, Manninger PA, Blais B, Carrillo CD. ConFindr: rapid detection of intraspecies and cross-species contamination in bacterial whole-genome sequence data. PeerJ 2019; 7:e6995 [View Article]
    [Google Scholar]
  69. Cunningham SA, Chia N, Jeraldo PR, Quest DJ, Johnson JA et al. Comparison of whole-genome sequencing methods for analysis of three methicillin-resistant Staphylococcus aureus outbreaks. J Clin Microbiol 2017; 55:1946–1953 [View Article]
    [Google Scholar]
  70. Didelot X, Bowden R, Street T, Golubchik T, Spencer C et al. Recombination and population structure in Salmonella enterica . PLoS Genet 2011; 7:e1002191 [View Article]
    [Google Scholar]
  71. Nadon C, Van Walle I, Gerner-Smidt P, Campos J, Chinen I et al. Pulsenet international: vision for the implementation of whole genome sequencing (WGS) for global food-borne disease surveillance. Euro Surveill. 2017; 22:30544 [View Article]
    [Google Scholar]
  72. Davis S, Pettengill JB, Luo Y, Payne J, Shpuntoff A et al. CFSAN SNP pipeline: an automated method for constructing SNP matrices from next-generation sequence data. PeerJ Computer Science 2015; 1:e20 [View Article]
    [Google Scholar]
/content/journal/mgen/10.1099/mgen.0.000318
Loading
/content/journal/mgen/10.1099/mgen.0.000318
Loading

Data & Media loading...

Supplements

Supplementary material 1

PDF

Supplementary material 2

EXCEL
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error