RFPlasmid: predicting plasmid sequences from short-read assembly data using machine learning Open Access

Abstract

Antimicrobial-resistance (AMR) genes in bacteria are often carried on plasmids and these plasmids can transfer AMR genes between bacteria. For molecular epidemiology purposes and risk assessment, it is important to know whether the genes are located on highly transferable plasmids or in the more stable chromosomes. However, draft whole-genome sequences are fragmented, making it difficult to discriminate plasmid and chromosomal contigs. Current methods that predict plasmid sequences from draft genome sequences rely on single features, like -mer composition, circularity of the DNA molecule, copy number or sequence identity to plasmid replication genes, all of which have their drawbacks, especially when faced with large single-copy plasmids, which often carry resistance genes. With our newly developed prediction tool RFPlasmid, we use a combination of multiple features, including -mer composition and databases with plasmid and chromosomal marker proteins, to predict whether the likely source of a contig is plasmid or chromosomal. The tool RFPlasmid supports models for 17 different bacterial taxa, including , and , and has a taxon agnostic model for metagenomic assemblies or unsupported organisms. RFPlasmid is available both as a standalone tool and via a web interface.

Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.000683
2021-11-30
2024-03-29
Loading full text...

Full text loading...

/deliver/fulltext/mgen/7/11/mgen000683.html?itemId=/content/journal/mgen/10.1099/mgen.0.000683&mimeType=html&fmt=ahah

References

  1. Smillie C, Garcillán-Barcia MP, Francia MV, Rocha EPC, de la Cruz F. Mobility of plasmids. Microbiol Mol Biol Rev 2010; 74:434–452 [View Article] [PubMed]
    [Google Scholar]
  2. Dib JR, Wagenknecht M, Farías ME, Meinhardt F. Strategies and approaches in plasmidome studies – uncovering plasmid diversity disregarding of linear elements. Front Microbiol 2015; 6:00463
    [Google Scholar]
  3. Li Y, Canchaya C, Fang F, Raftis E, Ryan KA et al. Distribution of megaplasmids in Lactobacillus salivarius and other lactobacilli. J Bacteriol 2007; 189:6128–6139 [View Article] [PubMed]
    [Google Scholar]
  4. Rozwandowicz M, Brouwer MSM, Fischer J, Wagenaar JA, Gonzalez-Zorn B et al. Plasmids carrying antimicrobial resistance genes in Enterobacteriaceae. J Antimicrob Chemother 2018; 73:1121–1137 [View Article] [PubMed]
    [Google Scholar]
  5. Carattoli A. Resistance plasmid families in Enterobacteriaceae. Antimicrob Agents Chemother 2009; 53:2227–2238 [View Article]
    [Google Scholar]
  6. Johnson TJ, Nolan LK. Pathogenomics of the virulence plasmids of Escherichia coli. Microbiol Mol Biol Rev 2009; 73:750–774 [View Article] [PubMed]
    [Google Scholar]
  7. Sengupta M, Austin S. Prevalence and significance of plasmid maintenance functions in the virulence plasmids of pathogenic bacteria. Infect Immun 2011; 79:2502–2509 [View Article] [PubMed]
    [Google Scholar]
  8. Goessweiner-Mohr N, Arends K, Keller W, Grohmann E. Conjugation in Gram-positive bacteria. Microbiol Spectr 2014; 2:2.4.19 [View Article]
    [Google Scholar]
  9. Oniciuc EA, Likotrafiti E, Alvarez-Molina A, Prieto M, Santos JA et al. The present and future of whole genome sequencing (WGS) and whole metagenome sequencing (WMS) for surveillance of antimicrobial resistant microorganisms and antimicrobial resistance genes across the food chain. Genes 2018; 9:268 [View Article]
    [Google Scholar]
  10. Park SE, Pham DT, Boinett C, Wong VK, Pak GD et al. The phylogeography and incidence of multi-drug resistant typhoid fever in sub-Saharan Africa. Nat Commun 2018; 9:5094
    [Google Scholar]
  11. Alikhan NF, Zhou Z, Sergeant MJ, Achtman M. A genomic overview of the population structure of Salmonella. PLoS Genet 2018; 14:1–13 [View Article]
    [Google Scholar]
  12. Jolley KA, Bray JE, Maiden MCJ. Open-access bacterial population genomics: BIGSdb software, the PubMLST.org website and their applications. Wellcome Open Res 2018; 3:124 [View Article]
    [Google Scholar]
  13. Wattam AR, Davis JJ, Assaf R, Boisvert S, Brettin T et al. Improvements to PATRIC, the all-bacterial bioinformatics database and analysis resource center. Nucleic Acids Res 2017; 45:D535–D542 [View Article]
    [Google Scholar]
  14. Zhou F, Xu Y. cBar: a computer program to distinguish plasmid-derived from chromosome-derived sequence fragments in metagenomics data. Bioinformatics 2010; 26:2051–2052 [View Article] [PubMed]
    [Google Scholar]
  15. Antipov D, Hartwick N, Shen M, Raiko M, Lapidus A et al. PlasmidSPAdes: Assembling plasmids from whole genome sequencing data. Bioinformatics 2016; 32:3380–3387 [View Article] [PubMed]
    [Google Scholar]
  16. Rozov R, Kav AB, Bogumil D, Shterzer N, Halperin E et al. Recycler: an algorithm for detecting plasmids from de novo assembly graphs. Bioinformatics 2017; 33:475–482 [View Article] [PubMed]
    [Google Scholar]
  17. Carattoli A, Zankari E, Garciá-Fernández A, Larsen M, Lund O et al. In Silico detection and typing of plasmids using plasmidfinder and plasmid multilocus sequence typing. Antimicrob Agents Chemother (Bethesda) 2014; 58:3895–3903 [View Article]
    [Google Scholar]
  18. Lanza VF, de Toro M, Garcillán-Barcia MP, Mora A, Blanco J et al. Plasmid flux in Escherichia coli ST131 sublineages, analyzed by plasmid constellation network (PLACNET), a new method for plasmid reconstruction from whole genome sequences. PLoS Genet 2014; 10:12 [View Article]
    [Google Scholar]
  19. Royer G, Decousser JW, Branger C, Dubois M, Médigue C et al. PlaScope: A targeted approach to assess the plasmidome from genome assemblies at the species level. Microb Genom 2018; 4:1–8 [View Article]
    [Google Scholar]
  20. Arredondo-Alonso S, Rogers MRC, Braat JC, Verschuuren TD, Top J et al. Mlplasmids: a user-friendly tool to predict plasmid- and chromosome-derived sequences for single species. Microb Genom 2018; 4:11 [View Article]
    [Google Scholar]
  21. Schwengers O, Barth P, Falgenhauer L, Hain T, Chakraborty T et al. Platon: Identification and characterization of bacterial plasmid contigs in short-read draft assemblies exploiting protein-sequence-based replicon distribution scores. BioRxiv 2020 [View Article]
    [Google Scholar]
  22. Arredondo-Alonso S, Willems RJ, van Schaik W, Schürch AC. On the (im)possibility of reconstructing plasmids from whole-genome short-read sequencing data. Microb Genom 2017; 3:10 [View Article]
    [Google Scholar]
  23. Zankari E, Hasman H, Cosentino S, Vestergaard M, Rasmussen S et al. Identification of acquired antimicrobial resistance genes. J Antimicrob Chemother 2012; 67:2640–2644 [View Article] [PubMed]
    [Google Scholar]
  24. Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: Assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 2015; 25:1043–1055 [View Article] [PubMed]
    [Google Scholar]
  25. Hyatt D, Chen GL, LoCascio PF, Land ML, Larimer FW et al. Prodigal: Prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 2010; 11:119 [View Article] [PubMed]
    [Google Scholar]
  26. Robertson J, Nash JHE. MOB-suite: software tools for clustering, reconstruction and typing of plasmids from draft assemblies. Microb Genom 2018; 4: [View Article] [PubMed]
    [Google Scholar]
  27. Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 2010; 26:2460–2461 [View Article] [PubMed]
    [Google Scholar]
  28. Buchfink B, Xie C, Huson D. Fast and sensitive protein alignment using DIAMOND. Nat Methods 2015; 12:59–60 [View Article] [PubMed]
    [Google Scholar]
  29. Marçais G, Kingsford C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 2011; 27:764–770 [View Article] [PubMed]
    [Google Scholar]
  30. Liaw A, Wiener M. Classification and Regression by randomForest. R News 2002; 2:18–22
    [Google Scholar]
  31. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M et al. SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 2012; 19:455–477 [View Article] [PubMed]
    [Google Scholar]
  32. Janitza S, Hornung R. On the overestimation of random forest’s out-of-bag error. PLoS One 2018; 13:e0201904 [View Article]
    [Google Scholar]
  33. Wick RR, Judd LM, Gorrie CL, Holt KE. Unicycler: resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput Biol 2017; 13:e1005595
    [Google Scholar]
  34. Wick RR, Schultz MB, Zobel J, Holt KE. Bandage: Interactive visualization of de novo genome assemblies. Bioinformatics 2015; 31:3350–3352 [View Article] [PubMed]
    [Google Scholar]
  35. Hamada M, Ono Y, Asai K, Frith MC, Hancock J. Training alignment parameters for arbitrary sequencers with LAST-TRAIN. Bioinformatics 2017; 33:926–928 [View Article] [PubMed]
    [Google Scholar]
  36. Reis-Cunha JL, Bartholomeu DC, Manson AL, Earl AM, Cerqueira GC. ProphET, prophage estimation tool: A standalone prophage sequence prediction tool with self-updating reference database. PLoS ONE 2019; 14:1–9
    [Google Scholar]
  37. Xie Z, Tang H. ISEScan: automated identification of insertion sequence elements in prokaryotic genomes. Bioinformatics 2017; 33:3340–3347 [View Article]
    [Google Scholar]
  38. Partridge SR, Kwong SM, Firth N, Jensen SO. Mobile genetic elements associated with antimicrobial resistance. Clin Microbiol Rev 2018; 31:1–61 [View Article]
    [Google Scholar]
  39. Almpanis A, Swain M, Gatherer D, McEwan N. Correlation between bacterial G+C content, genome size and the G+C content of associated plasmids and bacteriophages. Microb Genom 2018; 4:0–7 [View Article]
    [Google Scholar]
  40. Rocha C, Danchin A. Base composition bias might result from competition for. Trends Genet 2002; 18:291–294 [View Article] [PubMed]
    [Google Scholar]
  41. Arndt D, Marcu A, Liang Y, Wishart DS. PHAST, PHASTER and PHASTEST: Tools for finding prophage in bacterial genomes. Brief Bioinformatics 2018; 20:1560–1567
    [Google Scholar]
  42. Galetti R, Andrade LN, Varani AM, Darini ALC. A phage-like plasmid carrying bla KPC-2 gene in carbapenem-resistant Pseudomonas aeruginosa. Front Microbiol 2019; 10:2–6 [View Article]
    [Google Scholar]
  43. Octavia S, Sara J, Lan R. Characterization of a large novel phage-like plasmid in Salmonella enterica serovar Typhimurium. FEMS Microbiol Lett 2015; 362:1–9 [View Article]
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.000683
Loading
/content/journal/mgen/10.1099/mgen.0.000683
Loading

Data & Media loading...

Supplements

Supplementary material 1

PDF

Most cited Most Cited RSS feed