Skip to content
1887

Abstract

The prediction of the plasmid host range is crucial for investigating the dissemination of plasmids and the transfer of resistance and virulence genes mediated by plasmids. Several machine learning-based tools have been developed to predict plasmid host ranges. These tools have been trained and tested based on the bacterial host records of plasmids in related databases. Typically, a plasmid genome in databases such as the National Center for Biotechnology Information is annotated with only one or a few bacterial hosts, which does not encompass all possible hosts. Consequently, existing methods may significantly underestimate the host ranges of mobile plasmids. In this work, we propose a novel method named HRPredict, which employs a word vector model to digitally represent the encoded proteins on plasmid genomes. Since it is difficult to confirm which host a particular plasmid definitely cannot enter, we developed a machine learning approach for predicting whether a plasmid can enter a specific bacterium as a no-negative samples learning task. Using multiple one-class support vector machine (SVM) models that do not require negative samples for training, HRPredict predicts the host range of plasmids across 45 families, 56 genera and 56 species. In the benchmark test set, we constructed reliable negative samples for each host taxonomic unit via two indirect methods, and we found that the area under the curve (AUC), F1-score, recall, precision and accuracy of most taxonomic unit prediction models exceeded 0.9. Among the 13 broad-host-range plasmid types, HRPredict demonstrated greater coverage than HOTSPOT and PlasmidHostFinder, thus successfully predicting the majority of hosts previously reported. Through feature importance calculation for each SVM model, we found that genes closely related to the plasmid host range are involved in functions such as bacterial adaptability, pathogenicity and survival. These findings provide significant insight into the mechanisms through which bacteria adjust to diverse environments through plasmids. The HRPredict algorithm is expected to facilitate in-depth research on the spread of broad-host-range plasmids and enable host-range predictions for novel plasmids reconstructed from microbiome sequencing data.

Funding
This study was supported by the:
  • National Natural Science Foundation of China (Award 82104625)
    • Principal Award Recipient: WaijiaoTang
  • National Natural Science Foundation of China (Award 82372300)
    • Principal Award Recipient: ZhenchengFang
  • National Natural Science Foundation of China (Award 82102508)
    • Principal Award Recipient: ZhenchengFang
  • National Key Research and Development Program of China (Award 2022YFA0806400)
    • Principal Award Recipient: HongweiZhou
  • This is an open-access article distributed under the terms of the Creative Commons Attribution License.
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.001355
2025-02-11
2026-03-14

Metrics

Loading full text...

Full text loading...

/deliver/fulltext/mgen/11/2/mgen001355.html?itemId=/content/journal/mgen/10.1099/mgen.0.001355&mimeType=html&fmt=ahah

References

  1. Rodríguez-Beltrán J, DelaFuente J, León-Sampedro R, MacLean RC, San Millán Á. Beyond horizontal gene transfer: the role of plasmids in bacterial evolution. Nat Rev Microbiol 2021; 19:347–359 [View Article] [PubMed]
    [Google Scholar]
  2. Francia MV, Varsaki A, Garcillán-Barcia MP, Latorre A, Drainas C et al. A classification scheme for mobilization regions of bacterial plasmids. FEMS Microbiol Rev 2004; 28:79–100 [View Article] [PubMed]
    [Google Scholar]
  3. Smillie C, Garcillán-Barcia MP, Francia MV, Rocha EPC, de la Cruz F. Mobility of plasmids. Microbiol Mol Biol Rev 2010; 74:434–452 [View Article] [PubMed]
    [Google Scholar]
  4. Garcillán-Barcia MP, Alvarado A, de la Cruz F. Identification of bacterial plasmids based on mobility and plasmid population biology. FEMS Microbiol Rev 2011; 35:936–956 [View Article] [PubMed]
    [Google Scholar]
  5. Arnold BJ, Huang IT, Hanage WP. Horizontal gene transfer and adaptive evolution in bacteria. Nat Rev Microbiol 2022; 20:206–218 [View Article] [PubMed]
    [Google Scholar]
  6. Brito IL. Examining horizontal gene transfer in microbial communities. Nat Rev Microbiol 2021; 19:442–453 [View Article] [PubMed]
    [Google Scholar]
  7. Ji Y, Shang J, Tang X, Sun Y. HOTSPOT: hierarchical host prediction for assembled plasmid contigs with transformer. Bioinformatics 2023; 39:btad283 [View Article] [PubMed]
    [Google Scholar]
  8. Aytan-Aktug D, Clausen PTLC, Szarvas J, Munk P, Otani S et al. PlasmidHostFinder: prediction of plasmid hosts using random forest. mSystems 2022; 7:e0118021 [View Article] [PubMed]
    [Google Scholar]
  9. Fang Z, Zhou H. Identification of the conjugative and mobilizable plasmid fragments in the plasmidome using sequence signatures. Microb Genom 2020; 6:mgen000459 [View Article] [PubMed]
    [Google Scholar]
  10. Fang Z, Tan J, Wu S, Li M, Wang C et al. PlasGUN: gene prediction in plasmid metagenomic short reads using deep learning. Bioinformatics 2020; 36:3239–3241 [View Article] [PubMed]
    [Google Scholar]
  11. Ng P. Dna2vec: consistent vector representations of variable-length k-mers; 2017; arXiv preprint arXiv:1701.06279
  12. Asgari E, Mofrad MRK. Continuous distributed representation of biological sequences for deep proteomics and genomics. PLoS One 2015; 10:e0141287 [View Article] [PubMed]
    [Google Scholar]
  13. Abramson J, Adler J, Dunger J, Evans R, Green T et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 2024; 630:493–500 [View Article] [PubMed]
    [Google Scholar]
  14. Zhu YH, Zhang C, Yu DJ, Zhang Y. Integrating unsupervised language model with triplet neural networks for protein gene ontology prediction. PLoS Comput Biol 2022; 18:e1010793 [View Article] [PubMed]
    [Google Scholar]
  15. Wu J, Ouyang J, Qin H, Zhou J, Roberts R et al. PLM-ARG: antibiotic resistance gene identification using a pretrained protein language model. Bioinformatics 2023; 39: [View Article]
    [Google Scholar]
  16. Liu W, Wang Z, You R, Xie C, Wei H et al. PLMSearch: protein language model powers accurate and fast sequence search for remote homology. Nat Commun 2024; 15:2775 [View Article]
    [Google Scholar]
  17. Feng T, Wu S, Zhou H, Fang Z. MOBFinder: a tool for mobilization typing of plasmid metagenomic fragments based on a language model. Gigascience 2024; 13:giae047 [View Article] [PubMed]
    [Google Scholar]
  18. Wu S, Feng T, Tang W, Qi C, Gao J et al. metaProbiotics: a tool for mining probiotic from metagenomic binning data based on a language model. Brief Bioinform 2024; 25:bbae085 [View Article] [PubMed]
    [Google Scholar]
  19. Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 2014; 30:2068–2069 [View Article] [PubMed]
    [Google Scholar]
  20. Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 2006; 22:1658–1659 [View Article]
    [Google Scholar]
  21. Suzuki H, Yano H, Brown CJ, Top EM. Predicting plasmid promiscuity based on genomic signature. J Bacteriol 2010; 192:6045–6055 [View Article] [PubMed]
    [Google Scholar]
  22. Wisniewski JA, Traore DA, Bannam TL, Lyras D, Whisstock JC et al. TcpM: a novel relaxase that mediates transfer of large conjugative plasmids from Clostridium perfringens. Mol Microbiol 2016; 99:884–896 [View Article] [PubMed]
    [Google Scholar]
  23. Ramachandran G, Miguel-Arribas A, Abia D, Singh PK, Crespo I et al. Discovery of a new family of relaxases in Firmicutes bacteria. PLoS Genet 2017; 13:e1006586 [View Article]
    [Google Scholar]
  24. Rajeev L, Luning EG, Dehal PS, Price MN, Arkin AP et al. Systematic mapping of two component response regulators to gene targets in a model sulfate reducing bacterium. Genome Biol 2011; 12:R99 [View Article] [PubMed]
    [Google Scholar]
  25. Turton JF, Ward ME, Woodford N, Kaufmann ME, Pike R et al. The role of ISAba1 in expression of OXA carbapenemase genes in Acinetobacter baumannii. FEMS Microbiol Lett 2006; 258:72–77 [View Article] [PubMed]
    [Google Scholar]
  26. Miller D, Stern A, Burstein D. Deciphering microbial gene function using natural language processing. Nat Commun 2022; 13:5731 [View Article] [PubMed]
    [Google Scholar]
  27. Tsukiyama S, Hasan MM, Fujii S, Kurata H. LSTM-PHV: prediction of human-virus protein-protein interactions by LSTM with word2vec. Brief Bioinform 2021; 22:bbab228 [View Article] [PubMed]
    [Google Scholar]
  28. Lin Z, Akin H, Rao R, Hie B, Zhu Z et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 2023; 379:1123–1130 [View Article] [PubMed]
    [Google Scholar]
  29. Brandes N, Ofer D, Peleg Y, Rappoport N, Linial M. ProteinBERT: a universal deep-learning model of protein sequence and function. Bioinformatics 2022; 38:2102–2110 [View Article] [PubMed]
    [Google Scholar]
/content/journal/mgen/10.1099/mgen.0.001355
Loading
/content/journal/mgen/10.1099/mgen.0.001355
Loading

Data & Media loading...

Supplements

Supplementary material 1

PDF

Supplementary material 2

EXCEL

Supplementary material 3

EXCEL
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An error occurred
Approval was partially successful, following selected items could not be processed due to error