On the (im)possibility of reconstructing plasmids from whole-genome short-read sequencing data

Sergio Arredondo-Alonso; Rob J. Willems; Willem van Schaik; Anita C. Schürch

doi:10.1099/mgen.0.000128

Volume 3, Issue 10

Other

Open Access

On the (im)possibility of reconstructing plasmids from whole-genome short-read sequencing data

Sergio Arredondo-Alonso¹, Rob J. Willems¹, Willem van Schaik^1,2 and Anita C. Schürch¹
View Affiliations Hide Affiliations

Affiliations: ¹ 1Department of Medical Microbiology, Universitair Medisch Centrum Utrecht, Utrecht, The Netherlands ² 2Institute of Microbiology and Infection, University of Birmingham, Birmingham, England, UK
*Correspondence: Anita C. Schürch [email protected]
Published: 18 August 2017 https://doi.org/10.1099/mgen.0.000128

Abstract

To benchmark algorithms for automated plasmid sequence reconstruction from short-read sequencing data, we selected 42 publicly available complete bacterial genome sequences spanning 12 genera, containing 148 plasmids. We predicted plasmids from short-read data with four programs (PlasmidSPAdes, Recycler, cBar and PlasmidFinder) and compared the outcome to the reference sequences. PlasmidSPAdes reconstructs plasmids based on coverage differences in the assembly graph. It reconstructed most of the reference plasmids (recall=0.82), but approximately a quarter of the predicted plasmid contigs were false positives (precision=0.75). PlasmidSPAdes merged 84 % of the predictions from genomes with multiple plasmids into a single bin. Recycler searches the assembly graph for sub-graphs corresponding to circular sequences and correctly predicted small plasmids, but failed with long plasmids (recall=0.12, precision=0.30). cBar, which applies pentamer frequency analysis to detect plasmid-derived contigs, showed a recall and precision of 0.76 and 0.62, respectively. However, cBar categorizes contigs as plasmid-derived and does not bin the different plasmids. PlasmidFinder, which searches for replicons, had the highest precision (1.0), but was restricted by the contents of its database and the contig length obtained from de novo assembly (recall=0.36). PlasmidSPAdes and Recycler detected putative small plasmids (<10 kbp), which were also predicted as plasmids by cBar, but were absent in the original assembly. This study shows that it is possible to automatically predict small plasmids. Prediction of large plasmids (>50 kbp) containing repeated sequences remains challenging and limits the high-throughput analysis of plasmids from short-read whole-genome sequencing data.

Received: 17/05/2017
Accepted: 18/07/2017
Published Online: 18/08/2017

Keyword(s): bacterial genomes , DNA sequence analysis , mobile genetic elements , plasmids and replicon benchmarking

This is an open access article under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution and reproduction in any medium, provided the original author and source are credited.

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.000128

2017-08-18

2024-04-28

Full text loading...

/deliver/fulltext/mgen/3/10/mgen000128.html?itemId=/content/journal/mgen/10.1099/mgen.0.000128&mimeType=html&fmt=ahah

References

Smalla K, Jechalke S, Top EM. Plasmid detection, characterization, and ecology. Microbiol Spectr 2015; 3:PLAS-0038-2014 [View Article][PubMed]
[Google Scholar]
Conlan S, Thomas PJ, Deming C, Park M, Lau AF et al. Single-molecule sequencing to track plasmid diversity of hospital-associated carbapenemase-producing Enterobacteriaceae. Sci Transl Med 2014; 6:254ra126 [View Article][PubMed]
[Google Scholar]
De Toro M, Garcilláon-Barcia MP, De La Cruz F. Plasmid diversity and adaptation analyzed by massive sequencing of Escherichia coli plasmids. Microbiol Spectr 2014; 2:PLAS–0031 [View Article][PubMed]
[Google Scholar]
Carattoli A, Zankari E, García-Fernández A, Voldby Larsen M, Lund O et al. In silico detection and typing of plasmids using PlasmidFinder and plasmid multilocus sequence typing. Antimicrob Agents Chemother 2014; 58:3895–3903 [View Article][PubMed]
[Google Scholar]
Zhou F, Xu Y. cBar: a computer program to distinguish plasmid-derived from chromosome-derived sequence fragments in metagenomics data. Bioinformatics 2010; 26:2051–2052 [View Article][PubMed]
[Google Scholar]
Lanza VF, de Toro M, Garcillán-Barcia MP, Mora A, Blanco J et al. Plasmid flux in Escherichia coli ST131 sublineages, analyzed by plasmid constellation network (PLACNET), a new method for plasmid reconstruction from whole genome sequences. PLoS Genet 2014; 10:e1004766 [View Article][PubMed]
[Google Scholar]
Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 2003; 13:2498–2504 [View Article][PubMed]
[Google Scholar]
de Been M, Lanza VF, de Toro M, Scharringa J, Dohmen W et al. Dissemination of cephalosporin resistance genes between Escherichia coli strains from farm animals and humans by specific plasmid lineages. PLoS Genet 2014; 10:e1004776 [View Article][PubMed]
[Google Scholar]
Rozov R, Brown Kav A, Bogumil D, Shterzer N, Halperin E et al. Recycler: an algorithm for detecting plasmids from de novo assembly graphs. Bioinformatics 2017; 33:475–482 [View Article][PubMed]
[Google Scholar]
Antipov D, Hartwick N, Shen M, Raiko M, Lapidus A et al. PlasmidSPAdes: assembling plasmids from whole genome sequencing data. Bioinformatics 2016; 32:3380–3387 [View Article][PubMed]
[Google Scholar]
Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 2012; 19:455–477 [View Article][PubMed]
[Google Scholar]
Prjibelski AD, Vasilinetc I, Bankevich A, Gurevich A, Krivosheeva T et al. ExSPAnder: a universal repeat resolver for DNA fragment assembly. Bioinformatics 2014; 30:i293–i301 [View Article][PubMed]
[Google Scholar]
Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics 2013; 29:1072–1075 [View Article][PubMed]
[Google Scholar]
Harrison PW, Lower RP, Kim NK, Young JP. Introducing the bacterial 'chromid': not a chromosome, not a plasmid. Trends Microbiol 2010; 18:141–148 [View Article][PubMed]
[Google Scholar]
Wick RR, Judd LM, Gorrie CL, Holt KE. Unicycler: resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput Biol 2017; 13:e1005595 [View Article][PubMed]
[Google Scholar]
Forde BM, Ben Zakour NL, Stanton-Cook M, Phan MD, Totsika M et al. The complete genome sequence of Escherichia coli EC958: a high quality reference sequence for the globally disseminated multidrug resistant E. coli O25b:H4-ST131 clone. PLoS One 2014; 9:e104400 [View Article][PubMed]
[Google Scholar]
Koren S, Phillippy AM. One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly. Curr Opin Microbiol 2015; 23:110–120 [View Article][PubMed]
[Google Scholar]

http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.000128

On the (im)possibility of reconstructing plasmids from whole-genome short-read sequencing data

M Gen 3, e000128 (2017); https://doi.org/10.1099/mgen.0.000128

/content/journal/mgen/10.1099/mgen.0.000128

Volume 3, Issue 10

Other

Open Access

On the (im)possibility of reconstructing plasmids from whole-genome short-read sequencing data

Abstract

Supplementary File 1

Supplementary File 2

Supplementary File 3

Most read this month

Most cited Most Cited RSS feed

Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification

ResFinder – an open online resource for identification of antimicrobial resistance genes in next-generation sequencing data and prediction of phenotypes from genotypes

Completing bacterial genome assemblies with multiplex MinION sequencing

MOB-suite: software tools for clustering, reconstruction and typing of plasmids from draft assemblies

SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments

ClermonTyping: an easy-to-use and accurate in silico method for Escherichia genus strain phylotyping

Identification of Klebsiella capsule synthesis loci from whole genome data

Emergence, molecular mechanisms and global spread of carbapenem-resistant Acinetobacter baumannii

chewBBACA: A complete suite for gene-by-gene schema creation and strain identification

ARIBA: rapid antimicrobial resistance genotyping directly from sequencing reads