1887

Abstract

Assembly of bacterial short-read whole-genome sequencing data frequently results in hundreds of contigs for which the origin, plasmid or chromosome, is unclear. Complete genomes resolved by long-read sequencing can be used to generate and label short-read contigs. These were used to train several popular machine learning methods to classify the origin of contigs from Enterococcus faecium, Klebsiella pneumoniae and Escherichia coli using pentamer frequencies. We selected support-vector machine (SVM) models as the best classifier for all three bacterial species (F1-score E. faecium=0.92, F1-score K. pneumoniae=0.90, F1-score E. coli=0.76), which outperformed other existing plasmid prediction tools using a benchmarking set of isolates. We demonstrated the scalability of our models by accurately predicting the plasmidome of a large collection of 1644 E. faecium isolates and illustrate its applicability by predicting the location of antibiotic-resistance genes in all three species. The SVM classifiers are publicly available as an R package and graphical-user interface called ‘mlplasmids’. We anticipate that this tool may significantly facilitate research on the dissemination of plasmids encoding antibiotic resistance and/or contributing to host adaptation.

Erratum
This article contains a correction applying to the following content:
Corrigendum: mlplasmids: a user-friendly tool to predict plasmid- and chromosome-derived sequences for single species
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.000224
2018-11-01
2019-12-09
Loading full text...

Full text loading...

/deliver/fulltext/mgen/4/11/mgen000224.html?itemId=/content/journal/mgen/10.1099/mgen.0.000224&mimeType=html&fmt=ahah

References

  1. Clewell DB, Weaver KE, Dunny GM, Coque TM, Francia MV et al. Extrachromosomal and Mobile Elements in Enterococci: Transmission, Maintenance, and Epidemiology. Boston, MA:: Massachusetts Eye and Ear Infirmary; 2014
    [Google Scholar]
  2. Smalla K, Jechalke S, Top EM. Plasmid detection, characterization, and ecology. Microbiol Spectr 2015;3:PLAS-0038-2014 [CrossRef][PubMed]
    [Google Scholar]
  3. Carattoli A. Plasmids and the spread of resistance. Int J Med Microbiol 2013;303:298–304 [CrossRef][PubMed]
    [Google Scholar]
  4. de Been M, Lanza VF, de Toro M, Scharringa J, Dohmen W et al. Dissemination of cephalosporin resistance genes between Escherichia coli strains from farm animals and humans by specific plasmid lineages. PLoS Genet 2014;10:e1004776 [CrossRef][PubMed]
    [Google Scholar]
  5. Doumith M, Godbole G, Ashton P, Larkin L, Dallman T et al. Detection of the plasmid-mediated mcr-1 gene conferring colistin resistance in human and food isolates of Salmonella enterica and Escherichia coli in England and Wales. J Antimicrob Chemother 2016;71:2300–2305 [CrossRef][PubMed]
    [Google Scholar]
  6. Freitas AR, Tedim AP, Francia MV, Jensen LB, Novais C et al. Multilevel population genetic analysis of vanA and vanB Enterococcus faecium causing nosocomial outbreaks in 27 countries (1986-2012). J Antimicrob Chemother 2016;71:3351–3366 [CrossRef][PubMed]
    [Google Scholar]
  7. Conlan S, Thomas PJ, Deming C, Park M, Lau AF et al. Single-molecule sequencing to track plasmid diversity of hospital-associated carbapenemase-producing Enterobacteriaceae. Sci Transl Med 2014;6:254ra126 [CrossRef][PubMed]
    [Google Scholar]
  8. Orlek A, Stoesser N, Anjum MF, Doumith M, Ellington MJ et al. Plasmid classification in an era of whole-genome sequencing: application in studies of antibiotic resistance epidemiology. Front Microbiol 2017;8:182 [CrossRef][PubMed]
    [Google Scholar]
  9. Loman NJ, Misra RV, Dallman TJ, Constantinidou C, Gharbia SE et al. Performance comparison of benchtop high-throughput sequencing platforms. Nat Biotechnol 2012;30:434–439 [CrossRef][PubMed]
    [Google Scholar]
  10. Sheppard AE, Stoesser N, Wilson DJ, Sebra R, Kasarskis A et al. Nested Russian doll-like genetic mobility drives rapid dissemination of the carbapenem resistance gene blaKPC. Antimicrob Agents Chemother 2016;60:3767–3778 [CrossRef][PubMed]
    [Google Scholar]
  11. Carattoli A, Zankari E, García-Fernández A, Voldby Larsen M, Lund O et al. In silico detection and typing of plasmids using plasmidfinder and plasmid multilocus sequence typing. Antimicrob Agents Chemother 2014;58:3895–3903 [CrossRef][PubMed]
    [Google Scholar]
  12. Zhou F, Xu Y. cBar: a computer program to distinguish plasmid-derived from chromosome-derived sequence fragments in metagenomics data. Bioinformatics 2010;26:2051–2052 [CrossRef][PubMed]
    [Google Scholar]
  13. Rozov R, Brown Kav A, Bogumil D, Shterzer N, Halperin E et al. Recycler: an algorithm for detecting plasmids from de novo assembly graphs. Bioinformatics 2017;33:475–482 [CrossRef][PubMed]
    [Google Scholar]
  14. Antipov D, Hartwick N, Shen M, Raiko M, Lapidus A et al. plasmidSPAdes: assembling plasmids from whole genome sequencing data. Bioinformatics 2016;32:3380–3387 [CrossRef][PubMed]
    [Google Scholar]
  15. Krawczyk PS, Lipinski L, Dziembowski A. PlasFlow: predicting plasmid sequences in metagenomic data using genome signatures. Nucleic Acids Res 2018;46:e35 [CrossRef][PubMed]
    [Google Scholar]
  16. Arredondo-Alonso S, Willems RJ, van Schaik W, Schürch AC. On the (im)possibility of reconstructing plasmids from whole-genome short-read sequencing data. Microb Genom 2017;3:e000128 [CrossRef][PubMed]
    [Google Scholar]
  17. de Toro M, Garcilláon-Barcia MP, de La Cruz F. Plasmid diversity and adaptation analyzed by massive sequencing of Escherichia coli plasmids. Microbiol Spectr 2014;2:PLAS–0031–2014 [CrossRef][PubMed]
    [Google Scholar]
  18. Vielva L, de Toro M, Lanza VF, de La Cruz F. PLACNETw: a web-based tool for plasmid reconstruction from bacterial genomes. Bioinformatics 2017;33:3796–3798 [CrossRef][PubMed]
    [Google Scholar]
  19. George S, Pankhurst L, Hubbard A, Votintseva A, Stoesser N et al. Resolving plasmid structures in Enterobacteriaceae using the MinION nanopore sequencer: assessment of MinION and MinION/Illumina hybrid data assembly approaches. Microb Genom 2017;3:e000118 [CrossRef][PubMed]
    [Google Scholar]
  20. Loman NJ, Quick J, Simpson JT. A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat Methods 2015;12:733–735 [CrossRef][PubMed]
    [Google Scholar]
  21. Koren S, Phillippy AM. One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly. Curr Opin Microbiol 2015;23:110–120 [CrossRef][PubMed]
    [Google Scholar]
  22. Risse J, Thomson M, Patrick S, Blakely G, Koutsovoulos G et al. A single chromosome assembly of Bacteroides fragilis strain BE1 from Illumina and MinION nanopore sequencing data. Gigascience 2015;4:60 [CrossRef][PubMed]
    [Google Scholar]
  23. Wick RR, Judd LM, Gorrie CL, Holt KE. Unicycler: resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput Biol 2017;13:e1005595 [CrossRef][PubMed]
    [Google Scholar]
  24. Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH et al. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol 2016;17:132 [CrossRef][PubMed]
    [Google Scholar]
  25. Warnes GR, Bolker B, Bonebakker L, Gentleman R, Huber W et al. gplots: Various R Programming Tools for Plotting Data, R package Version 2. 2009
    [Google Scholar]
  26. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 2012;19:455–477 [CrossRef][PubMed]
    [Google Scholar]
  27. Li H. 2013; Aligning Sequence Reads, Clone Sequences and Assembly Contigs with BWA-MEM. http://arxiv.org/abs/1303.3997
  28. Pagès H, Aboyoun P, Gentleman R, DebRoy S. Biostrings: Efficient Manipulation of Biological Strings, R Package Version 2.42.1. 2016
    [Google Scholar]
  29. der M, Hinton G. Visualizing data using t-SNE. J Mach Learn Res 2008;9:2579–2605
    [Google Scholar]
  30. Bischl B, Lang M, Kotthoff L, Schiffner J, Richter J et al. mlr: machine learning in R. J Mach Learn Res 2016;17:1–5
    [Google Scholar]
  31. Kuhn M. caret: classification and regression training. Astrophysics Source Code Library 2015;https://ui.adsabs.harvard.edu/#abs/2015ascl.soft05003K
    [Google Scholar]
  32. Dimitriadou E, Hornik K, Leisch F, Meyer D, Weingessel A et al. The e1071 Package. Misc Functions of Department of Statistics Vienna: TU Wien; 2006;www.cs.upc.edu/~belanche/Docencia/mineria/Practiques/R/e1071.pdf
    [Google Scholar]
  33. Clewell DB, Weaver KE, Dunny GM, Coque TM, Francia MV et al. Extrachromosomal and mobile elements in enterococci: transmission, maintenance, and epidemiology. In Gilmore MS, Clewell DB, Ike Y, Shankar N. (editors) Enterococci: from Commensals to Leading Causes of Drug Resistant Infection Boston: Massachusetts Eye and Ear Infirmary; 2014
    [Google Scholar]
  34. Rstudio Shiny: Easy Web Applications in R 2014
    [Google Scholar]
  35. Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics 2013;29:1072–1075 [CrossRef][PubMed]
    [Google Scholar]
  36. Zankari E, Hasman H, Cosentino S, Vestergaard M, Rasmussen S et al. Identification of acquired antimicrobial resistance genes. J Antimicrob Chemother 2012;67:2640–2644 [CrossRef][PubMed]
    [Google Scholar]
  37. Wardal E, Kuch A, Gawryszewska I, Żabicka D, Hryniewicz W et al. Diversity of plasmids and Tn1546-type transposons among VanA Enterococcus faecium in Poland. Eur J Clin Microbiol Infect Dis 2017;36:313–328 [CrossRef][PubMed]
    [Google Scholar]
  38. van Hal SJ, Ip CL, Ansari MA, Wilson DJ, Espedido BA et al. Evolutionary dynamics of Enterococcus faecium reveals complex genomic relationships between isolates with independent emergence of vancomycin resistance. Microb Genom 2016;2::000048 [CrossRef][PubMed]
    [Google Scholar]
  39. Navon-Venezia S, Kondratyeva K, Carattoli A. Klebsiella pneumoniae: a major worldwide source and shuttle for antibiotic resistance. FEMS Microbiol Rev 2017;41:252–275 [CrossRef][PubMed]
    [Google Scholar]
  40. Falgenhauer L, Waezsada SE, Gwozdzinski K, Ghosh H, Doijad S et al. Chromosomal locations of mcr-1 and blaCTX-M-15 in fluoroquinolone-resistant Escherichia coli ST410. Emerg Infect Dis 2016;22:1689–1691 [CrossRef][PubMed]
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.000224
Loading
/content/journal/mgen/10.1099/mgen.0.000224
Loading

Data & Media loading...

Supplements

Supplementary File 1

PDF

Supplementary File 2

Most Cited This Month

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error