1887

Abstract

Plasmids are the key element in horizontal gene transfer in the microbial community. Recently, a large number of experimental and computational methods have been developed to obtain the plasmidomes of microbial communities. Distinguishing transmissible plasmid sequences, which are derived from conjugative or at least mobilizable plasmids, from non-transmissible plasmid sequences in the plasmidome is essential for understanding the diversity of plasmids and how they regulate the microbial community. Unfortunately, due to the highly fragmented characteristics of DNA sequences in the plasmidome, effective identification methods are lacking. In this work, we used information entropy from information theory to assess the randomness of synonymous codon usage over 4424 plasmid genomes. The results showed that for all amino acids, the choice of a synonymous codon in conjugative and mobilizable plasmids is more random than that in non-transmissible plasmids, indicating that transmissible plasmids have different sequence signatures from non-transmissible plasmids. Inspired by this phenomenon, we further developed a novel algorithm named PlasTrans. PlasTrans takes the triplet code sequences and base sequences of plasmid DNA fragments as input and uses the convolutional neural network of the deep learning technique to further extract the more complex signatures of the plasmid sequences and identify the conjugative and mobilizable DNA fragments. Tests showed that PlasTrans could achieve an AUC of as high as 84–91%, even though the fragments only contained hundreds of base pairs. To the best of our knowledge, this is the first quantitative analysis of the difference in sequence signatures between transmissible and non-transmissible plasmids, and we developed the first tool to perform transferability annotation for DNA fragments in the plasmidome. We expect that PlasTrans will be a useful tool for researchers who analyse the properties of novel plasmids in the microbial community and horizontal gene transfer, especially the spread of resistance genes and virulence factors associated with plasmids. PlasTrans is freely available via https://github.com/zhenchengfang/PlasTrans

Funding
This study was supported by the:
  • Hongwei Zhou , National Natural Science Foundation of China (CN) , (Award 81925026)
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.000459
2020-10-19
2020-11-25
Loading full text...

Full text loading...

/deliver/fulltext/mgen/10.1099/mgen.0.000459/mgen000459.html?itemId=/content/journal/mgen/10.1099/mgen.0.000459&mimeType=html&fmt=ahah

References

  1. Brown Kav A, Sasson G, Jami E, Doron-Faigenboim A, Benhar I et al. Insights into the bovine rumen plasmidome. Proc Natl Acad Sci U S A 2012; 109:5452–5457 [CrossRef][PubMed]
    [Google Scholar]
  2. Jones BV, Marchesi JR. Transposon-aided capture (TRACA) of plasmids resident in the human gut mobile metagenome. Nat Methods 2007; 4:55–61 [CrossRef][PubMed]
    [Google Scholar]
  3. Bale MJ, Day MJ, Fry JC. Novel method for studying plasmid transfer in undisturbed river epilithon. Appl Environ Microbiol 1988; 54:2756–2758 [CrossRef][PubMed]
    [Google Scholar]
  4. Hill KE, Weightman AJ, Fry JC. Isolation and screening of plasmids from the epilithon which mobilize recombinant plasmid pD10. Appl Environ Microbiol 1992; 58:1292–1300 [CrossRef][PubMed]
    [Google Scholar]
  5. Zhou F, Xu Y. cBar: a computer program to distinguish plasmid-derived from chromosome-derived sequence fragments in metagenomics data. Bioinformatics 2010; 26:2051–2052 [CrossRef][PubMed]
    [Google Scholar]
  6. Krawczyk PS, Lipinski L, Dziembowski A. PlasFlow: predicting plasmid sequences in metagenomic data using genome signatures. Nucleic Acids Res 2018; 46:e35 [CrossRef][PubMed]
    [Google Scholar]
  7. Fang Z, Tan J, Wu S, Li M, Xu C et al. PPR-Meta: a tool for identifying phages and plasmids from metagenomic fragments using deep learning. Gigascience 2019; 8: [CrossRef][PubMed]
    [Google Scholar]
  8. Pellow D, Mizrahi I, Shamir R. PlasClass improves plasmid sequence classification. PLoS Comput Biol 2020; 16:e1007781 [CrossRef][PubMed]
    [Google Scholar]
  9. Rozov R, Kav AB, Bogumil D et al. Recycler: an algorithm for detecting plasmids from de novo assembly graphs. Bioinformatics 2016; 33:475–482
    [Google Scholar]
  10. Antipov D, Raiko M, Lapidus A, Pevzner PA. Plasmid detection and assembly in genomic and metagenomic data sets. Genome Res 2019; 29:961–968 [CrossRef][PubMed]
    [Google Scholar]
  11. Smillie C, Garcillán-Barcia MP, Francia MV, Rocha EPC, de la Cruz F et al. Mobility of plasmids. Microbiol Mol Biol Rev 2010; 74:434–452 [CrossRef][PubMed]
    [Google Scholar]
  12. Shintani M, Sanchez ZK, Kimbara K. Genomics of microbial plasmids: classification and identification based on replication and transfer systems and host taxonomy. Front Microbiol 2015; 6:242 [CrossRef][PubMed]
    [Google Scholar]
  13. Robertson J, Nash JHE. MOB-suite: software tools for clustering, reconstruction and typing of plasmids from draft assemblies. Microb Genom 2018; 4: [CrossRef][PubMed]
    [Google Scholar]
  14. Fang Z, Tan J, Wu S, Li M, Wang C et al. PlasGUN: gene prediction in plasmid metagenomic short reads using deep learning. Bioinformatics 2020; 36:3239–3241 [CrossRef][PubMed]
    [Google Scholar]
  15. Suzuki H, Yano H, Brown CJ, Top EM. Predicting plasmid promiscuity based on genomic signature. J Bacteriol 2010; 192:6045–6055 [CrossRef][PubMed]
    [Google Scholar]
  16. Min S, Lee B, Yoon S. Deep learning in bioinformatics. Brief Bioinform 2016; 31:bbw068–869 [CrossRef]
    [Google Scholar]
  17. Ren J, Ahlgren NA, Lu YY, Fuhrman JA, Sun F et al. VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data. Microbiome 2017; 5:69 [CrossRef][PubMed]
    [Google Scholar]
  18. Richter DC, Ott F, Auch AF, Schmid R, Huson DH et al. MetaSim: a sequencing simulator for genomics and metagenomics. PLoS One 2008; 3:e3373–3421 [CrossRef][PubMed]
    [Google Scholar]
  19. Sandberg R, Winberg G, Bränden CI, Kaske A, Ernberg I et al. Capturing whole-genome characteristics in short sequences using a naïve Bayesian classifier. Genome Res 2001; 11:1404–1409 [CrossRef][PubMed]
    [Google Scholar]
  20. Ahlgren NA, Ren J, Lu YY, Fuhrman JA, Sun F et al. Alignment-free $d_2^*$ oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences. Nucleic Acids Res 2017; 45:39–53 [CrossRef][PubMed]
    [Google Scholar]
  21. Galiez C, Siebert M, Enault F, Vincent J, Söding J et al. Wish: who is the host? predicting prokaryotic hosts from metagenomic phage contigs. Bioinformatics 2017; 33:3113–3114 [CrossRef][PubMed]
    [Google Scholar]
  22. Meyer F, Paarmann D, D'Souza M, Olson R, Glass EM et al. The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 2008; 9:386 [CrossRef][PubMed]
    [Google Scholar]
  23. Hudson CM, Lau BY, Williams KP. Islander: a database of precisely mapped genomic islands in tRNA and tmRNA genes. Nucleic Acids Res 2015; 43:D48–D53 [CrossRef][PubMed]
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.000459
Loading
/content/journal/mgen/10.1099/mgen.0.000459
Loading

Data & Media loading...

Supplements

Supplementary material 1

EXCEL
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error