1887

Abstract

The partitioning of pathogenic strains isolated in environmental or human cases to their sources is challenging. The pathogens usually colonize multiple animal hosts, including livestock, which contaminate the food-production chain and the environment (e.g. soil and water), posing an additional public-health burden and major challenges in the identification of the source. Genomic data opens up new opportunities for the development of statistical models aiming to indicate the likely source of pathogen contamination. Here, we propose a computationally fast and efficient multinomial logistic regression source-attribution classifier to predict the animal source of bacterial isolates based on ‘source-enriched’ loci extracted from the accessory-genome profiles of a pangenomic dataset. Depending on the accuracy of the model’s self-attribution step, the modeller selects the number of candidate accessory genes that best fit the model for calculating the likelihood of (source) category membership. The Accessory genes-Based Source Attribution (AB_SA) method was applied to a dataset of strains of Typhimurium and its monophasic variant (. 1,4,[5],12:i:-). The model was trained on 69 strains with known animal-source categories (i.e. poultry, ruminant and pig). The AB_SA method helped to identify 8 genes as predictors among the 2802 accessory genes. The self-attribution accuracy was 80 %. The AB_SA model was then able to classify 25 of the 29 . Typhimurium and . 1,4,[5],12:i:- isolates collected from the environment (considered to be of unknown source) into a specific category (i.e. animal source), with more than 85 % of probability. The AB_SA method herein described provides a user-friendly and valuable tool for performing source-attribution studies in only a few steps. AB_SA is written in R and freely available at https://github.com/lguillier/AB_SA.

Funding
This study was supported by the:
  • Not Applicable , H2020 COMPARE , (Award 643476)
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.000366
2020-04-22
2020-06-04
Loading full text...

Full text loading...

/deliver/fulltext/mgen/10.1099/mgen.0.000366/mgen000366.html?itemId=/content/journal/mgen/10.1099/mgen.0.000366&mimeType=html&fmt=ahah

References

  1. Mughini-Gras L, Franz E, van Pelt W. New paradigms for Salmonella source attribution based on microbial subtyping. Food Microbiol 2018; 71:60–67 [CrossRef][PubMed][PubMed]
    [Google Scholar]
  2. Mughini-Gras L, Kooh P, Fravalo P, Augustin J-C, Guillier L et al. Critical orientation in the jungle of currently available methods and types of data for source attribution of foodborne diseases. Front Microbiol 2019; 10:2578 [CrossRef][PubMed][PubMed]
    [Google Scholar]
  3. Barco L, Barrucci F, Olsen JE, Ricci A. Salmonella source attribution based on microbial subtyping. Int J Food Microbiol 2013; 163:193–203 [CrossRef][PubMed][PubMed]
    [Google Scholar]
  4. Nei M, Stephens JC, Saitou N. Methods for computing the standard errors of branching points in an evolutionary tree and their application to molecular data from humans and apes. Mol Biol Evol 1985; 2:66–85 [CrossRef][PubMed][PubMed]
    [Google Scholar]
  5. Reynolds J, Weir BS, Cockerham CC. Estimation of the coancestry coefficient: basis for a short-term genetic distance. Genetics 1983; 105:767–779[PubMed][PubMed]
    [Google Scholar]
  6. Mather AE, Vaughan TG, French NP. Molecular approaches to understanding transmission and source attribution in nontyphoidal Salmonella and their application in Africa. Clin Infect Dis 2015; 61 (Suppl. 4):S259–S265 [CrossRef][PubMed][PubMed]
    [Google Scholar]
  7. Sheppard SK, Guttman DS, Fitzgerald JR. Population genomics of bacterial host adaptation. Nat Rev Genet 2018; 19:549–565 [CrossRef][PubMed][PubMed]
    [Google Scholar]
  8. Sévellec Y, Felten A, Radomski N, Granier SA, Le Hello S et al. Genetic diversity of Salmonella Derby from the poultry sector in Europe. Pathogens 2019; 8:46 [CrossRef][PubMed][PubMed]
    [Google Scholar]
  9. Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics 2000; 155:945–959[PubMed][PubMed]
    [Google Scholar]
  10. Wilson DJ, Gabriel E, Leatherbarrow AJH, Cheesbrough J, Gee S et al. Tracing the source of campylobacteriosis. PLoS Genet 2008; 4:e1000203 [CrossRef][PubMed][PubMed]
    [Google Scholar]
  11. Liao S-J, Marshall J, Hazelton ML, French NP. Extending statistical models for source attribution of zoonotic diseases: a study of campylobacteriosis. J R Soc Interface 2019; 16:20180534 [CrossRef][PubMed][PubMed]
    [Google Scholar]
  12. Thépault A, Méric G, Rivoal K, Pascoe B, Mageiros L et al. Genome-wide identification of host-segregating epidemiological markers for source attribution in Campylobacter jejuni. Appl Environ Microbiol 2017; 83:e03085-16 [CrossRef][PubMed][PubMed]
    [Google Scholar]
  13. Nielsen EM, Björkman JT, Kiil K, Grant K, Dallman T et al. Closing gaps for performing a risk assessment on Listeria monocytogenes in ready‐to‐eat (RTE) foods: activity 3, the comparison of isolates from different compartments along the food chain, and from humans using whole genome sequencing (WGS) analysis. EFSA Supporting Publications 2017; 14:1151E [CrossRef]
    [Google Scholar]
  14. Njage PMK, Henri C, Leekitcharoenphon P, Mistou M-Y, Hendriksen RS et al. Machine learning methods as a tool for predicting risk of illness applying next-generation sequencing data. Risk Anal 2019; 39:1397–1413 [CrossRef][PubMed][PubMed]
    [Google Scholar]
  15. Wheeler NE. Tracing outbreaks with machine learning. Nat Rev Microbiol 2019; 17:269 [CrossRef][PubMed][PubMed]
    [Google Scholar]
  16. Munck NSM, Njage PMK, Litrup E, Hald T. Trends and sources in human salmonellosis. In Helwigh B, Müller L. (editors) Annual Report on Zoonoses in Denmark 2017 Kongens Lyngby: Technical University of Denmark; 2018 pp 6–8
    [Google Scholar]
  17. Zhang S, Li S, Gu W, den Bakker H, Boxrud D et al. Zoonotic source attribution of Salmonella enterica serotype Typhimurium using genomic surveillance data, United States. Emerg Infect Dis 2019; 25:82–91 [CrossRef][PubMed][PubMed]
    [Google Scholar]
  18. Lupolova N, Dallman TJ, Holden NJ, Gally DL. Patchy promiscuity: machine learning applied to predict the host specificity of Salmonella enterica and Escherichia coli. Microb Genom 2017; 3:e000135 [CrossRef][PubMed][PubMed]
    [Google Scholar]
  19. Vernikos G, Medini D, Riley DR, Tettelin H. Ten years of pan-genome analyses. Curr Opin Microbiol 2015; 23:148–154 [CrossRef][PubMed][PubMed]
    [Google Scholar]
  20. Richardson EJ, Bacigalupe R, Harrison EM, Weinert LA, Lycett S et al. Gene exchange drives the ecological success of a multi-host bacterial pathogen. Nat Ecol Evol 2018; 2:1468–1478 [CrossRef][PubMed][PubMed]
    [Google Scholar]
  21. Jostins L, McVean G. Trinculo: Bayesian and frequentist multinomial logistic regression for genome-wide association studies of multi-category phenotypes. Bioinformatics 2016; 32:1898–1900 [CrossRef][PubMed][PubMed]
    [Google Scholar]
  22. Serra A, Galdi P, Tagliaferri R. Machine learning for bioinformatics and neuroimaging. WIREs Data Mining Knowl Discov 2018; 8:e1248 [CrossRef]
    [Google Scholar]
  23. White A, Cronquist A, Bedrick EJ, Scallan E. Food source prediction of Shiga toxin-producing Escherichia coli outbreaks using demographic and outbreak characteristics, United States, 1998–2014. Foodborne Pathog Dis 2016; 13:527–534 [CrossRef][PubMed][PubMed]
    [Google Scholar]
  24. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 2012; 19:455–477 [CrossRef][PubMed][PubMed]
    [Google Scholar]
  25. Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics 2013; 29:1072–1075 [CrossRef][PubMed][PubMed]
    [Google Scholar]
  26. Munck N, Leekitcharoenphon P, Litrup E, Kaas R, Meinen A, Schielke A et al. Four European Salmonella Typhimurium datasets collected to develop WGS-based source attribution methods. Sci Data 2020; 7:75 [CrossRef][PubMed][PubMed]
    [Google Scholar]
  27. Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 2014; 30:2068–2069 [CrossRef][PubMed][PubMed]
    [Google Scholar]
  28. Page AJ, Cummins CA, Hunt M, Wong VK, Reuter S et al. Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics 2015; 31:3691–3693 [CrossRef][PubMed][PubMed]
    [Google Scholar]
  29. Brynildsrud O, Bohlin J, Scheffer L, Eldholm V. Rapid scoring of genes in microbial pan-genome-wide association studies with Scoary. Genome Biol 2016; 17:238 [CrossRef][PubMed][PubMed]
    [Google Scholar]
  30. Kuhn M. Building predictive models in R using the caret package. J Stat Softw 2008; 28:1–26 [CrossRef]
    [Google Scholar]
  31. Stone M. Cross-validatory choice and assessment of statistical predictions. J R Stat Soc Series B Stat Methodol 1974; 36:111–133 [CrossRef]
    [Google Scholar]
  32. Rincé A, Balière C, Hervio-Heath D, Cozien J, Lozach S et al. Occurrence of bacterial pathogens and human noroviruses in shellfish-harvesting areas and their catchments in France. Front Microbiol 2018; 9:2443 [CrossRef][PubMed][PubMed]
    [Google Scholar]
  33. Lupolova N, Lycett SJ, Gally DL. A guide to machine learning for bacterial host attribution using genome sequence data. Microb Genom 2019; 5:e000317 [CrossRef][PubMed][PubMed]
    [Google Scholar]
  34. Libbrecht MW, Noble WS. Machine learning applications in genetics and genomics. Nat Rev Genet 2015; 16:321–332 [CrossRef][PubMed][PubMed]
    [Google Scholar]
  35. Brodersen KH, Ong CS, Stephan KE, Buhmann JM. The balanced accuracy and its posterior distribution. 2010 International Conference on Pattern Recognition 2010; 20:3121–3124
    [Google Scholar]
  36. Sanchez-Pinto LN, Venable LR, Fahrenbach J, Churpek MM. Comparison of variable selection methods for clinical predictive modeling. Int J Med Inform 2018; 116:10–17 [CrossRef][PubMed][PubMed]
    [Google Scholar]
  37. Berthenet E, Thépault A, Chemaly M, Rivoal K, Ducournau A et al. Source attribution of Campylobacter jejuni shows variable importance of chicken and ruminants reservoirs in non-invasive and invasive French clinical isolates. Sci Rep 2019; 9:8098 [CrossRef][PubMed][PubMed]
    [Google Scholar]
  38. Granjon T, Maniti O, Auchli Y, Dahinden P, Buchet R et al. Structure-function relations in oxaloacetate decarboxylase complex. Fluorescence and infrared approaches to monitor oxomalonate and Na+ binding effect. PLoS One 2010; 5:e10935 [CrossRef][PubMed][PubMed]
    [Google Scholar]
  39. Palma F, Manfreda G, Silva M, Parisi A, Barker DOR et al. Genome-wide identification of geographical segregated genetic markers in Salmonella enterica serovar Typhimurium variant 4,[5],12:i:-. Sci Rep 2018; 8:15251 [CrossRef][PubMed][PubMed]
    [Google Scholar]
  40. Buchanan CJ, Webb AL, Mutschall SK, Kruczkiewicz P, Barker DOR et al. A Genome-wide association study to identify diagnostic markers for human pathogenic Campylobacter jejuni strains. Front Microbiol 2017; 8:1224 [CrossRef][PubMed][PubMed]
    [Google Scholar]
  41. Vereen E, Lowrance RR, Jenkins MB, Adams P, Rajeev S et al. Landscape and seasonal factors influence Salmonella and Campylobacter prevalence in a rural mixed use watershed. Water Res 2013; 47:6075–6085 [CrossRef][PubMed][PubMed]
    [Google Scholar]
  42. Flockhart L, Pintar K, Cook A, McEwen S, Friendship R et al. Distribution of Salmonella in humans, production animal operations and a watershed in a FoodNet Canada sentinel site. Zoonoses Public Health 2017; 64:41–52 [CrossRef][PubMed][PubMed]
    [Google Scholar]
  43. Wray C, Wray A. Salmonella in Domestic Animals Wallingford: CABI; 2000 p 478
    [Google Scholar]
  44. Patchanee P, Molla B, White N, Line DE, Gebreyes WA. Tracking Salmonella contamination in various watersheds and phenotypic and genotypic diversity. Foodborne Pathog Dis 2010; 7:1113–1120 [CrossRef][PubMed][PubMed]
    [Google Scholar]
  45. Botteldoorn N, Heyndrickx M, Rijpens N, Grijspeerdt K, Herman L. Salmonella on pig carcasses: positive pigs and cross contamination in the slaughterhouse. J Appl Microbiol 2003; 95:891–903 [CrossRef][PubMed][PubMed]
    [Google Scholar]
  46. Bonardi S. Salmonella in the pork production chain and its impact on human health in the European Union. Epidemiol Infect 2017; 145:1513–1526 [CrossRef][PubMed][PubMed]
    [Google Scholar]
  47. Kozlica J, Claudet AL, Solomon D, Dunn JR, Carpenter LR. Waterborne outbreak of Salmonella I 4,[5],12:i:-. Foodborne Pathog Dis 2010; 7:1431–1433 [CrossRef][PubMed][PubMed]
    [Google Scholar]
  48. Doyle MP, Erickson MC. Summer meeting 2007 – the problems with fresh produce: an overview. J Appl Microbiol 2008; 105:317–330 [CrossRef]
    [Google Scholar]
  49. Nygård K, Lassen J, Vold L, Andersson Y, Fisher I et al. Outbreak of Salmonella Thompson infections linked to imported rucola lettuce. Foodborne Pathog Dis 2008; 5:165–173 [CrossRef][PubMed][PubMed]
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.000366
Loading
/content/journal/mgen/10.1099/mgen.0.000366
Loading

Data & Media loading...

Supplements

Supplementary material 1

EXCEL
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error