Skip to content
1887

Abstract

Bacteria are fascinating research objects in many disciplines for countless reasons, and whole-genome sequencing (WGS) has become the paramount methodology to advance our microbiological understanding. Meanwhile, access to cost-effective sequencing platforms has accelerated bacterial WGS to unprecedented levels, introducing new challenges in terms of data accessibility, computational demands, heterogeneity of analysis workflows and, thus, ultimately its scientific usability. To this end, a previous study released a uniformly processed set of 661 405 bacterial genome assemblies obtained from the European Nucleotide Archive as of November 2018. Building on these accomplishments, we conducted further genome-based analyses like taxonomic classification, multilocus sequence typing and annotation of all genomes. Here, we present BakRep, a searchable large-scale web repository of these genomes enriched with consistent genome characterizations and original metadata. The platform provides a flexible search engine combining taxonomic, genomic and metadata information, as well as interactive elements to visualize genomic features. Furthermore, all results can be downloaded for offline analyses via an accompanying command line tool. The web repository is accessible via https://bakrep.computational.bio.

Funding
This study was supported by the:
  • German Network for Bioinformatics Infrastructure (Award 031A532B, 031A533A, 031A533B, 031A534A, 031A535A, 031A537A, 031A537B, 031A537C, 031A537D, 031A538A)
    • Principal Award Recipient: AlexanderGoesmann
  • This is an open-access article distributed under the terms of the Creative Commons Attribution License. This article was made open access via a Publish and Read agreement between the Microbiology Society and the corresponding author’s institution.
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.001305
2024-10-30
2025-12-07

Metrics

Loading full text...

Full text loading...

/deliver/fulltext/mgen/10/10/mgen001305.html?itemId=/content/journal/mgen/10.1099/mgen.0.001305&mimeType=html&fmt=ahah

References

  1. Blaxter M, Danchin A, Savakis B, Fukami-Kobayashi K, Kurokawa K et al. Reminder to deposit DNA sequences. Science 2016; 352:780 [View Article] [PubMed]
    [Google Scholar]
  2. Wilkinson MD, Dumontier M, Aalbersberg IJJ, Appleton G, Axton M et al. The FAIR guiding principles for scientific data management and Stewardship. Sci Data 2016; 3:160018 [View Article] [PubMed]
    [Google Scholar]
  3. Bagheri H, Severin AJ, Rajan H. Detecting and correcting misclassified sequences in the large-scale public databases. Bioinformatics 2020; 36:4699–4705 [View Article] [PubMed]
    [Google Scholar]
  4. Keck F, Couton M, Altermatt F. Navigating the seven challenges of taxonomic reference databases in metabarcoding analyses. Mol Ecol Resour 2023; 23:742–755 [View Article] [PubMed]
    [Google Scholar]
  5. Denton JF, Lugo-Martinez J, Tucker AE, Schrider DR, Warren WC et al. Extensive error in the number of genes inferred from draft genome assemblies. PLOS Comput Biol 2014; 10:e1003998 [View Article] [PubMed]
    [Google Scholar]
  6. Salzberg SL. Next-generation genome annotation: we still struggle to get it right. Genome Biol 2019; 20:92 [View Article] [PubMed]
    [Google Scholar]
  7. Zhou Z, Alikhan NF, Mohamed K, Fan Y. Agama Study Group et al. The enterobase user’s guide, with case studies on salmonella transmissions, yersinia pestis phylogeny, and escherichia core genomic diversity. Genome Res 2020; 30:138–152 [View Article]
    [Google Scholar]
  8. Blackwell GA, Hunt M, Malone KM, Lima L, Horesh G et al. Exploring bacterial diversity via a curated and searchable snapshot of archived DNA sequences. PLOS Biol 2021; 19:e3001421 [View Article] [PubMed]
    [Google Scholar]
  9. Chaumeil PA, Mussig AJ, Hugenholtz P, Parks DH. GTDB-Tk v2: memory friendly classification with the genome taxonomy database. Bioinformatics 2022; 38:5315–5316 [View Article] [PubMed]
    [Google Scholar]
  10. Parks DH, Chuvochina M, Rinke C, Mussig AJ, Chaumeil P-A et al. GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Res 2022; 50:D785–D794 [View Article] [PubMed]
    [Google Scholar]
  11. Chklovski A, Parks DH, Woodcroft BJ, Tyson GW. CheckM2: a rapid, scalable and accurate tool for assessing microbial genome quality using machine learning. Nat Methods 2023; 20:1203–1212 [View Article] [PubMed]
    [Google Scholar]
  12. Schwengers O, Jelonek L, Dieckmann MA, Beyvers S, Blom J et al. Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification. Microb Genom 2021; 7:000685 [View Article] [PubMed]
    [Google Scholar]
  13. Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E et al. Nextflow enables reproducible computational workflows. Nat Biotechnol 2017; 35:316–319 [View Article] [PubMed]
    [Google Scholar]
  14. Foster ZSL, Sharpton TJ, Grünwald NJ. Metacoder: an R package for visualization and manipulation of community taxonomic diversity data. PLoS Comput Biol 2017; 13:e1005404 [View Article] [PubMed]
    [Google Scholar]
  15. Parviainen T. Real-Time Web Application Development Using Vert.x 2.0 Packt Publishing; 2013 p 122
    [Google Scholar]
  16. Gormley C, Tong Z. Elasticsearch: The Definitive Guide, 1st ed O’Reilly Media, Inc; 2015 p 724
    [Google Scholar]
  17. Robinson JT, Thorvaldsdottir H, Turner D, Mesirov JP. IGV.js: an embeddable JavaScript implementation of the Integrative Genomics Viewer (IGV). Bioinformatics 2023; 39:btac830 [View Article] [PubMed]
    [Google Scholar]
  18. O’Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D et al. Reference Sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res 2016; 44:D733–D745 [View Article]
    [Google Scholar]
  19. Bateman A, Martin M-J, Orchard S, Magrane M, Ahmad S et al. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res 2023; 51:D523–D531 [View Article] [PubMed]
    [Google Scholar]
  20. Land M, Hauser L, Jun S-R, Nookaew I, Leuze MR et al. Insights from 20 years of bacterial genome sequencing. Funct Integr Genom 2015; 15:141–161 [View Article]
    [Google Scholar]
  21. Timme RE, Sanchez Leon M, Allard MW. Utilizing the public genometrakr database for foodborne pathogen traceback. Methods Mol Biol Clifton NJ 2019; 1918:201–212 [View Article]
    [Google Scholar]
  22. Achtman M, Zhou Z, Alikhan N-F, Tyne W, Parkhill J et al. Genomic diversity of Salmonella enterica -the UoWUCC 10K genomes project. Wellcome Open Res 2020; 5:223 [View Article] [PubMed]
    [Google Scholar]
  23. Parks DH, Chuvochina M, Waite DW, Rinke C, Skarshewski A et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol 2018; 36:996–1004 [View Article] [PubMed]
    [Google Scholar]
  24. Garcia R, Gemperlein K, Müller R. Minicystis rosea gen. nov., sp. nov., a polyunsaturated fatty acid-rich and steroid-producing soil myxobacterium. Int J Syst Evol Microbiol 2014; 64:3733–3742 [View Article] [PubMed]
    [Google Scholar]
  25. Rodríguez-Gijón A, Nuy JK, Mehrshad M, Buck M, Schulz F et al. A genomic perspective across earth’s microbiomes reveals that genome size in archaea and bacteria is linked to ecosystem type and trophic strategy. Front Microbiol 2021; 12:761869 [View Article] [PubMed]
    [Google Scholar]
  26. Wenkai T, Bin L, Mengyun C, Wensheng S. Genomic legacies of ancient adaption illuminate the GC-content evolutionin bacterial genomes. Microbiol Spectr 2019; 11:e02145-22
    [Google Scholar]
  27. Smits THM. The importance of genome sequence quality to microbial comparative genomics. BMC Genom 2019; 20:662 [View Article] [PubMed]
    [Google Scholar]
  28. Parks DH, Chuvochina M, Reeves PR, Beatson SA, Hugenholtz P. Reclassification of Shigella species as later heterotypic synonyms of Escherichia coli in the genome taxonomy database. Microbiology 2021 [View Article]
    [Google Scholar]
  29. Appelt S, Rohleder A-M, Jacob D, von Buttlar H, Georgi E et al. Genetic diversity and spatial distribution of Burkholderia mallei by core genome-based multilocus sequence typing analysis. PLoS One 2022; 17:e0270499 [View Article] [PubMed]
    [Google Scholar]
  30. Losada L, Ronning CM, DeShazer D, Woods D, Fedorova N et al. Continuing evolution of Burkholderia mallei through genome reduction and large-scale rearrangements. Genome Biol Evol 2010; 2:102–116 [View Article] [PubMed]
    [Google Scholar]
  31. Hatcher CL, Muruato LA, Torres AG. Recent advances in Burkholderia mallei and B. pseudomallei research. Curr Trop Med Rep 2015; 2:62–69 [View Article] [PubMed]
    [Google Scholar]
/content/journal/mgen/10.1099/mgen.0.001305
Loading
/content/journal/mgen/10.1099/mgen.0.001305
Loading

Data & Media loading...

Supplements

Supplementary material 1

PDF

Supplementary material 2

EXCEL

Supplementary material 3

EXCEL
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An error occurred
Approval was partially successful, following selected items could not be processed due to error