1887

Abstract

MLST (multi-locus sequence typing) is a classic technique for genotyping bacteria, widely applied for pathogen outbreak surveillance. Traditionally, MLST is based on identifying sequence types from a small number of housekeeping genes. With the increasing availability of whole-genome sequencing data, MLST methods have evolved towards larger typing schemes, based on a few hundred genes [core genome MLST (cgMLST)] to a few thousand genes [whole genome MLST (wgMLST)]. Such large-scale MLST schemes have been shown to provide a finer resolution and are increasingly used in various contexts such as hospital outbreaks or foodborne pathogen outbreaks. This methodological shift raises new computational challenges, especially given the large size of the schemes involved. Very few available MLST callers are currently capable of dealing with large MLST schemes. We introduce MentaLiST, a new MLST caller, based on a k-mer voting algorithm and written in the Julia language, specifically designed and implemented to handle large typing schemes. We test it on real and simulated data to show that MentaLiST is faster than any other available MLST caller while providing the same or better accuracy, and is capable of dealing with MLST schemes with up to thousands of genes while requiring limited computational resources. MentaLiST source code and easy installation instructions using a Conda package are available at https://github.com/WGS-TB/MentaLiST.

Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.000146
2018-01-10
2019-08-24
Loading full text...

Full text loading...

/deliver/fulltext/mgen/4/2/mgen000146.html?itemId=/content/journal/mgen/10.1099/mgen.0.000146&mimeType=html&fmt=ahah

References

  1. Maiden MC, Bygraves JA, Feil E, Morelli G, Russell JE et al. Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc Natl Acad Sci USA 1998;95:3140–3145 [CrossRef][PubMed]
    [Google Scholar]
  2. Pérez-Losada M, Arenas M, Castro-Nallar E. Multilocus sequence typing of pathogens. In Genetics and Evolution of Infectious Diseases Amsterdam, Netherlands: Elsevier; 2017; pp.383–404
    [Google Scholar]
  3. Jolley KA, Maiden MC. BIGSdb: Scalable analysis of bacterial genome variation at the population level. BMC Bioinformatics 2010;11:595 [CrossRef][PubMed]
    [Google Scholar]
  4. Jolley KA, Hill DM, Bratcher HB, Harrison OB, Feavers IM et al. Resolution of a meningococcal disease outbreak from whole-genome sequence data with rapid web-based analysis methods. J Clin Microbiol 2012;50:3046–3053 [CrossRef][PubMed]
    [Google Scholar]
  5. Grad YH, Lipsitch M. Epidemiologic data and pathogen genome sequences: a powerful synergy for public health. Genome Biol 2014;15:538 [CrossRef][PubMed]
    [Google Scholar]
  6. Kwong JC, Mccallum N, Sintchenko V, Howden BP. Whole genome sequencing in clinical and public health microbiology. Pathology 2015;47:199–210 [CrossRef][PubMed]
    [Google Scholar]
  7. Ronholm J, Nasheri N, Petronella N, Pagotto F. Navigating microbiological food safety in the era of whole-genome sequencing. Clin Microbiol Rev 2016;29:837–857 [CrossRef][PubMed]
    [Google Scholar]
  8. Leopold SR, Goering RV, Witten A, Harmsen D, Mellmann A. Bacterial whole-genome sequencing revisited: portable, scalable, and standardized analysis for typing and detection of virulence and antibiotic resistance genes. J Clin Microbiol 2014;52:2365–2370 [CrossRef][PubMed]
    [Google Scholar]
  9. Lynch T, Petkau A, Knox N, Graham M, van Domselaar G. A primer on infectious disease bacterial genomics. Clin Microbiol Rev 2016;29:881–913 [CrossRef][PubMed]
    [Google Scholar]
  10. Guthrie JL, Gardy JL. A brief primer on genomic epidemiology: lessons learned from Mycobacterium tuberculosis. Ann N Y Acad Sci 2017;1388:59–77 [CrossRef][PubMed]
    [Google Scholar]
  11. Dekker JP, Frank KM. Next-generation epidemiology: using real-time core genome multilocus sequence typing to support infection control policy. J Clin Microbiol 2016;54:2850–2853 [CrossRef][PubMed]
    [Google Scholar]
  12. Pinholt M, Larner-Svensson H, Littauer P, Moser CE, Pedersen M et al. Multiple hospital outbreaks of vanA Enterococcus faecium in Denmark, 2012–13, investigated by WGS, MLST and PFGE. J Antimicrob Chemother 2015;70:2474–2482 [CrossRef][PubMed]
    [Google Scholar]
  13. McAdam PR, Richardson EJ, Fitzgerald JR. High-throughput sequencing for the study of bacterial pathogen biology. Curr Opin Microbiol 2014;19:106–113 [CrossRef][PubMed]
    [Google Scholar]
  14. Maiden MC, Jansen van Rensburg MJ, Bray JE, Earle SG, Ford SA et al. MLST revisited: the gene-by-gene approach to bacterial genomics. Nat Rev Microbiol 2013;11:728–736 [CrossRef][PubMed]
    [Google Scholar]
  15. Kohl TA, Diel R, Harmsen D, Rothgänger J, Walter KM et al. Whole-genome-based Mycobacterium tuberculosis surveillance: a standardized, portable, and expandable approach. J Clin Microbiol 2014;52:2479–2486 [CrossRef][PubMed]
    [Google Scholar]
  16. de Been M, Pinholt M, Top J, Bletz S, Mellmann A et al. Core genome multilocus sequence typing scheme for high-resolution typing of Enterococcus faecium. J Clin Microbiol 2015;53:3788–3797 [CrossRef][PubMed]
    [Google Scholar]
  17. Ruppitsch W, Pietzka A, Prior K, Bletz S, Fernandez HL et al. Defining and evaluating a core genome multilocus sequence typing scheme for whole-genome sequence-based typing of Listeria monocytogenes. J Clin Microbiol 2015;53:2869–2876 [CrossRef][PubMed]
    [Google Scholar]
  18. Mellmann A, Bletz S, Böking T, Kipp F, Becker K et al. Real-time genome sequencing of resistant bacteria provides precision infection control in an institutional setting. J Clin Microbiol 2016;54:2874–2881 [CrossRef][PubMed]
    [Google Scholar]
  19. Gonzalez-Escalona N, Jolley KA, Reed E, Martinez-Urtaza J. Defining a core genome multilocus sequence typing scheme for the global epidemiology of Vibrio parahaemolyticus. J Clin Microbiol 2017;55:1682–1697 [CrossRef][PubMed]
    [Google Scholar]
  20. Martin RM, Cao J, Brisse S, Passet V, Wu W et al. Molecular epidemiology of colonizing and infecting isolates of Klebsiella pneumoniae. mSphere 2016;1:e00261-16 [CrossRef][PubMed]
    [Google Scholar]
  21. Deng X, den Bakker HC, Hendriksen RS. Genomic epidemiology: whole-genome-sequencing-powered surveillance and outbreak investigation of foodborne bacterial pathogens. Annu Rev Food Sci Technol 2016;7:353–374 [CrossRef][PubMed]
    [Google Scholar]
  22. Hunt M, Mather AE, Sánchez-Busó L, Page AJ, Parkhill J et al. ARIBA: rapid antimicrobial resistance genotyping directly from sequencing reads. Microb Genom 2017;3:e000131 [CrossRef][PubMed]
    [Google Scholar]
  23. Brinkac LM, Beck E, Inman J, Venepally P, Fouts DE et al. LOCUST: a custom sequence locus typer for classifying microbial isolates. Bioinformatics 2017;33:1725–1726 [CrossRef][PubMed]
    [Google Scholar]
  24. Larsen MV, Cosentino S, Rasmussen S, Friis C, Hasman H et al. Multilocus sequence typing of total-genome-sequenced bacteria. J Clin Microbiol 2012;50:1355–1361 [CrossRef][PubMed]
    [Google Scholar]
  25. Pightling AW, Petronella N, Pagotto F. The Listeria monocytogenes core-genome sequence typer (LmCGST): a bioinformatic pipeline for molecular characterization with next-generation sequence data. BMC Microbiol 2015;15:224 [CrossRef][PubMed]
    [Google Scholar]
  26. Seemann T. MLST [Internet]. Available fromhttps://github.com/tseemann/mlst
  27. Yoshida CE, Kruczkiewicz P, Laing CR, Lingohr EJ, Gannon VP et al. The Salmonella In Silico Typing Resource (SISTR): an open web-accessible tool for rapidly typing and subtyping draft Salmonella genome assemblies. PLoS One 2016;11:e0147101 [CrossRef][PubMed]
    [Google Scholar]
  28. Inouye M, Dashnow H, Raven LA, Schultz MB, Pope BJ et al. SRST2: Rapid genomic surveillance for public health and hospital microbiology labs. Genome Med 2014;6:90 [CrossRef][PubMed]
    [Google Scholar]
  29. Tewolde R, Dallman T, Schaefer U, Sheppard CL, Ashton P et al. MOST: a modified MLST typing tool based on short read sequencing. PeerJ 2016;4:e2308 [CrossRef][PubMed]
    [Google Scholar]
  30. Wood DE, Salzberg SL. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol 2014;15:R46 [CrossRef][PubMed]
    [Google Scholar]
  31. Gupta A, Jordan IK, Rishishwar L. stringMLST: a fast k-mer based tool for multilocus sequence typing. Bioinformatics 2017;33:119–121 [CrossRef][PubMed]
    [Google Scholar]
  32. Roosaare M, Vaher M, Kaplinski L, Möls M, Andreson R et al. StrainSeeker: fast identification of bacterial strains from raw sequencing reads using user-provided guide trees. PeerJ 2017;5:e3353 [CrossRef][PubMed]
    [Google Scholar]
  33. Iqbal Z, Caccamo M, Turner I, Flicek P, Mcvean G. De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat Genet 2012;44:226–232 [CrossRef][PubMed]
    [Google Scholar]
  34. Page AJ, Alikhan NF, Carleton HA, Seemann T, Keane JA et al. Comparison of classical multi-locus sequence typing software for next-generation sequencing data. Microb Genom 2017;3:e000124 [CrossRef][PubMed]
    [Google Scholar]
  35. cgMLST.org Nomenclature Server. http://www.cgmlst.org/ncs
  36. Dejesus MA, Gerrick ER, Xu W, Park SW, Long JE et al. Comprehensive essentiality analysis of the Mycobacterium tuberculosis genome via saturating transposon mutagenesis. MBio 2017;8:e02133-16 [CrossRef][PubMed]
    [Google Scholar]
  37. EnteroBase [Internet]. Available fromhttp://enterobase.warwick.ac.uk/
  38. Huang W, Li L, Myers JR, Marth GT. ART: a next-generation sequencing read simulator. Bioinformatics 2012;28:593–594 [CrossRef][PubMed]
    [Google Scholar]
  39. Cohen T, van Helden PD, Wilson D, Colijn C, McLaughlin MM et al. Mixed-strain Mycobacterium tuberculosis infections and the implications for tuberculosis treatment and control. Clin Microbiol Rev 2012;25:708–719 [CrossRef][PubMed]
    [Google Scholar]
  40. Eyre DW, Cule ML, Griffiths D, Crook DW, Peto TE et al. Detection of mixed infection from bacterial whole genome sequence data allows assessment of its role in Clostridium difficile transmission. PLoS Comput Biol 2013;9:e1003059 [CrossRef][PubMed]
    [Google Scholar]
  41. Sankar A, Malone B, Bayliss SC, Pascoe B, Méric G et al. Bayesian identification of bacterial strains from sequencing data. Microb Genom 2016;2:e000075 [CrossRef][PubMed]
    [Google Scholar]
  42. Bezanson J, Edelman A, Karpinski S, Shah VB. Julia: a fresh approach to numerical computing. SIAM Rev 2017;59:65–98 [CrossRef]
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.000146
Loading
/content/journal/mgen/10.1099/mgen.0.000146
Loading

Data & Media loading...

Most Cited This Month

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error