1887

Abstract

With the continued evolution of DNA sequencing technologies, the role of genome sequence data has become more integral in the classification and identification of Bacteria and Archaea. Six years after introducing EzBioCloud, an integrated platform representing the taxonomic hierarchy of Bacteria and Archaea through quality-controlled 16S rRNA gene and genome sequences, we present an updated version, that further refines and expands its capabilities. The current update recognizes the growing need for accurate taxonomic information as defining a species increasingly relies on genome sequence comparisons. We also incorporated an advanced strategy for addressing underrepresented or less studied lineages, bolstering the comprehensiveness and accuracy of our database. Our rigorous quality control protocols remain, where whole-genome assemblies from the NCBI Assembly Database undergo stringent screening to remove low-quality sequence data. These are then passed through our enhanced identification bioinformatics pipeline which initiates a 16S rRNA gene similarity search and then calculates the average nucleotide identity (ANI). For genome sequences lacking a 16S rRNA sequence and without a closely related genomic representative for ANI calculation, we apply a different ANI approach using bacterial core genes for improved taxonomic placement (core gene ANI, cgANI). Because of the increase in genome sequences available in NCBI and our newly introduced cgANI method, EzBioCloud now encompasses a total of 109 835 species, of which 21 964 have validly published names. 47 896 are candidate species identified either through 16S rRNA sequence similarity (phylotypes) or through whole genome ANI (genomospecies), and the remaining 39 975 were positioned in the taxonomic tree by cgANI (species clusters). Our EzBioCloud database is accessible at www.ezbiocloud.net/db.

Funding
This study was supported by the:
  • CJ Bioscience, Inc
    • Principle Award Recipient: MauricioChalita [Co-First Author]
  • This is an open-access article distributed under the terms of the Creative Commons Attribution License.
Loading

Article metrics loading...

/content/journal/ijsem/10.1099/ijsem.0.006421
2024-06-18
2024-07-15
Loading full text...

Full text loading...

/deliver/fulltext/ijsem/74/6/ijsem006421.html?itemId=/content/journal/ijsem/10.1099/ijsem.0.006421&mimeType=html&fmt=ahah

References

  1. Yoon S-H, Ha S-M, Kwon S, Lim J, Kim Y et al. Introducing EzBioCloud: a taxonomically united database of 16S rRNA gene sequences and whole-genome assemblies. Int J Syst Evol Microbiol 2017; 67:1613–1617 [View Article] [PubMed]
    [Google Scholar]
  2. Johnson JS, Spakowicz DJ, Hong B-Y, Petersen LM, Demkowicz P et al. Evaluation of 16S rRNA gene sequencing for species and strain-level microbiome analysis. Nat Commun 2019; 10:5029 [View Article] [PubMed]
    [Google Scholar]
  3. Lee I, Ouk Kim Y, Park S-C, Chun J. OrthoANI: an improved algorithm and software for calculating average nucleotide identity. Int J Syst Evol Microbiol 2016; 66:1100–1103 [View Article] [PubMed]
    [Google Scholar]
  4. Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun 2018; 9:5114 [View Article] [PubMed]
    [Google Scholar]
  5. Yilmaz P, Parfrey LW, Yarza P, Gerken J, Pruesse E et al. The SILVA and “all-species living tree project (LTP)” taxonomic frameworks. Nucleic Acids Res 2014; 42:D643–D648 [View Article] [PubMed]
    [Google Scholar]
  6. Liu Z, DeSantis TZ, Andersen GL, Knight R. Accurate taxonomy assignments from 16S rRNA sequences produced by highly parallel pyrosequencers. Nucleic Acids Res 2008; 36:e120 [View Article] [PubMed]
    [Google Scholar]
  7. Oren A, Arahal DR, Göker M, Moore ERB, Rossello-Mora R et al. International Code of Nomenclature of Prokaryotes. Prokaryotic Code (2022 Revision). Int J Syst Evol Microbiol 2023; 73: [View Article]
    [Google Scholar]
  8. Chun J, Oren A, Ventosa A, Christensen H, Arahal DR et al. Proposed minimal standards for the use of genome data for the taxonomy of prokaryotes. Int J Syst Evol Microbiol 2018; 68:461–466 [View Article] [PubMed]
    [Google Scholar]
  9. Riesco R, Trujillo ME. Update on the proposed minimal standards for the use of genome data for the taxonomy of prokaryotes. Int J Syst Evol Microbiol 2024; 74:1–12 [View Article] [PubMed]
    [Google Scholar]
  10. Jeon Y-S, Lee K, Park S-C, Kim B-S, Cho Y-J et al. EzEditor: a versatile sequence alignment editor for both rRNA- and protein-coding genes. Int J Syst Evol Microbiol 2014; 64:689–691 [View Article] [PubMed]
    [Google Scholar]
  11. Na S-I, Kim YO, Yoon S-H, Ha S-M, Baek I et al. UBCG: up-to-date bacterial core gene set and pipeline for phylogenomic tree reconstruction. J Microbiol 2018; 56:280–285 [View Article] [PubMed]
    [Google Scholar]
  12. Daily J. Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments. BMC Bioinform 2016; 17:81 [View Article] [PubMed]
    [Google Scholar]
  13. Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH et al. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol 2016; 17:132 [View Article] [PubMed]
    [Google Scholar]
  14. Price MN, Dehal PS, Arkin AP. FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One 2010; 5:e9490 [View Article] [PubMed]
    [Google Scholar]
  15. Chaumeil P-A, Mussig AJ, Hugenholtz P, Parks DH. GTDB-Tk: a toolkit to classify genomes with the genome taxonomy database. Bioinformatics 2019; 36:1925–1927 [View Article] [PubMed]
    [Google Scholar]
  16. Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biol 2019; 20:257 [View Article] [PubMed]
    [Google Scholar]
  17. Chalita M, Ha S-M, Kim YO, Oh H-S, Yoon S-H et al. Improved metagenomic taxonomic profiling using a curated core gene-based bacterial database reveals unrecognized species in the genus Streptococcus. Pathogens 2020; 9:204 [View Article] [PubMed]
    [Google Scholar]
  18. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods 2012; 9:357–359 [View Article] [PubMed]
    [Google Scholar]
  19. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J et al. The sequence alignment/map format and SAMtools. Bioinformatics 2009; 25:2078–2079 [View Article] [PubMed]
    [Google Scholar]
  20. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 2010; 26:841–842 [View Article] [PubMed]
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journal/ijsem/10.1099/ijsem.0.006421
Loading
/content/journal/ijsem/10.1099/ijsem.0.006421
Loading

Data & Media loading...

Supplements

Supplementary material 1

EXCEL
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error