1887

Abstract

In metagenomic analyses of microbiomes, one of the first steps is usually the taxonomic classification of reads by comparison to a database of previously taxonomically classified genomes. While different studies comparing metagenomic taxonomic classification methods have determined that different tools are ‘best’, there are two tools that have been used the most to-date: Kraken (-mer-based classification against a user-constructed database) and MetaPhlAn (classification by alignment to clade-specific marker genes), the latest versions of which are Kraken2 and MetaPhlAn 3, respectively. We found large discrepancies in both the proportion of reads that were classified as well as the number of species that were identified when we used both Kraken2 and MetaPhlAn 3 to classify reads within metagenomes from human-associated or environmental datasets. We then investigated which of these tools would give classifications closest to the real composition of metagenomic samples using a range of simulated and mock samples and examined the combined impact of tool–parameter–database choice on the taxonomic classifications given. This revealed that there may not be a one-size-fits-all ‘best’ choice. While Kraken2 can achieve better overall performance, with higher precision, recall and F1 scores, as well as alpha- and beta-diversity measures closer to the known composition than MetaPhlAn 3, the computational resources required for this may be prohibitive for many researchers, and the default database and parameters should not be used. We therefore conclude that the best tool–parameter–database choice for a particular application depends on the scientific question of interest, which performance metric is most important for this question and the limit of available computational resources.

Funding
This study was supported by the:
  • Dalhousie Medical Research Foundation
    • Principle Award Recipient: RobynJ Wright
  • Natural Sciences and Engineering Research Council of Canada (Award 2016-05039)
    • Principle Award Recipient: MorganGI Langille
  • This is an open-access article distributed under the terms of the Creative Commons Attribution License.
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.000949
2023-03-03
2024-03-03
Loading full text...

Full text loading...

/deliver/fulltext/mgen/9/3/mgen000949.html?itemId=/content/journal/mgen/10.1099/mgen.0.000949&mimeType=html&fmt=ahah

References

  1. McIntyre ABR, Ounit R, Afshinnekoo E, Prill RJ, Hénaff E et al. Comprehensive benchmarking and ensemble approaches for metagenomic classifiers. Genome Biol 2017; 18:182 [View Article] [PubMed]
    [Google Scholar]
  2. Tamames J, Cobo-Simón M, Puente-Sánchez F. Assessing the performance of different approaches for functional and taxonomic annotation of metagenomes. BMC Genomics 2019; 20:960 [View Article] [PubMed]
    [Google Scholar]
  3. Ye SH, Siddle KJ, Park DJ, Sabeti PC. Benchmarking metagenomics tools for taxonomic classification. Cell 2019; 178:779–794 [View Article]
    [Google Scholar]
  4. Escobar-Zepeda A, Godoy-Lozano EE, Raggi L, Segovia L, Merino E et al. Analysis of sequencing strategies and tools for taxonomic annotation: defining standards for progressive metagenomics. Sci Rep 2018; 8:12034 [View Article]
    [Google Scholar]
  5. Sczyrba A, Hofmann P, Belmann P, Koslicki D, Janssen S et al. Critical assessment of metagenome interpretation-A benchmark of metagenomics software. Nat Methods 2017; 14:1063–1071 [View Article]
    [Google Scholar]
  6. Meyer F, Fritz A, Deng Z-L, Koslicki D, Lesker TR et al. Critical assessment of metagenome interpretation: the second round of challenges. Nat Methods 2022; 19:429–440 [View Article]
    [Google Scholar]
  7. Beghini F, McIver LJ, Blanco-Míguez A, Dubois L, Asnicar F et al. Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3. Elife 2021; 10:1–42 [View Article] [PubMed]
    [Google Scholar]
  8. Parks DH, Rigato F, Vera-Wolf P, Krause L, Hugenholtz P et al. Evaluation of the microba community profiler for taxonomic profiling of metagenomic datasets from the human gut microbiome. Front Microbiol 2021; 12:643682 [View Article]
    [Google Scholar]
  9. Shah N, Molloy EK, Pop M, Warnow T. TIPP2: metagenomic taxonomic profiling using phylogenetic markers. Bioinformatics 2021; 37:1839–1845 [View Article] [PubMed]
    [Google Scholar]
  10. R Marcelino V, Holmes EC, Sorrell TC. The use of taxon-specific reference databases compromises metagenomic classification. BMC Genomics 2020; 21:184 [View Article] [PubMed]
    [Google Scholar]
  11. O’Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res 2016; 44:D733–45 [View Article] [PubMed]
    [Google Scholar]
  12. Wood DE, Salzberg SL. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol 2014; 15:R46 [View Article] [PubMed]
    [Google Scholar]
  13. Segata N, Waldron L, Ballarini A, Narasimhan V, Jousson O et al. Metagenomic microbial community profiling using unique clade-specific marker genes. Nat Methods 2012; 9:811–814 [View Article] [PubMed]
    [Google Scholar]
  14. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods 2012; 9:357–359 [View Article] [PubMed]
    [Google Scholar]
  15. Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biol 2019; 20:257 [View Article] [PubMed]
    [Google Scholar]
  16. Lu J, Breitwieser FP, Thielen P, Salzberg SL. Bracken: estimating species abundance in metagenomics data. PeerJ Comput Sci 2017; 3:e104 [View Article]
    [Google Scholar]
  17. Walsh AM, Crispie F, O’Sullivan O, Finnegan L, Claesson MJ et al. Species classifier choice is a key consideration when analysing low-complexity food microbiome data. Microbiome 2018; 6:50 [View Article] [PubMed]
    [Google Scholar]
  18. Sun Z, Huang S, Zhang M, Zhu Q, Haiminen N et al. Challenges in benchmarking metagenomic profilers. Nat Methods 2021; 18:618–626 [View Article] [PubMed]
    [Google Scholar]
  19. Wright RJ, Comeau A, Langille MGI. Data for From defaults to databases: parameter and database choice dramatically impact the performance of metagenomic taxonomic classification tools.
  20. Douglas GM, Maffei VJ, Zaneveld JR, Yurgel SN, Brown JR et al. PICRUSt2 for prediction of metagenome functions. Nat Biotechnol 2020; 38:685–688 [View Article] [PubMed]
    [Google Scholar]
  21. Lokmer A, Cian A, Froment A, Gantois N, Viscogliosi E et al. Use of shotgun metagenomics for the identification of protozoa in the gut microbiota of healthy individuals from worldwide populations with various industrialization levels. PLoS One 2019; 14:e0211139 [View Article] [PubMed]
    [Google Scholar]
  22. Methé BA, Nelson KE, Pop M, Creasy HH, Giglio MG. A framework for human microbiome research. Nature 2012; 486:215–221 [View Article] [PubMed]
    [Google Scholar]
  23. Dhakan DB, Maji A, Sharma AK, Saxena R, Pulikkan J et al. The unique composition of Indian gut microbiome, gene catalogue, and associated fecal metabolome deciphered using multi-omics approaches. Gigascience 2019; 8:1–20 [View Article] [PubMed]
    [Google Scholar]
  24. Finlayson-Trick ECL, Getz LJ, Slaine PD, Thornbury M, Lamoureux E et al. Taxonomic differences of gut microbiomes drive cellulolytic enzymatic potential within hind-gut fermenting mammals. PLoS One 2017; 12:e0189404 [View Article] [PubMed]
    [Google Scholar]
  25. Gillies LE, Thrash JC, deRada S, Rabalais NN, Mason OU. Archaeal enrichment in the hypoxic zone in the northern Gulf of Mexico. Environ Microbiol 2015; 17:3847–3856 [View Article] [PubMed]
    [Google Scholar]
  26. Amato KR, G Sanders J, Song SJ, Nute M, Metcalf JL et al. Evolutionary trends in host physiology outweigh dietary niche in structuring primate gut microbiomes. ISME J 2019; 13:576–587 [View Article] [PubMed]
    [Google Scholar]
  27. Yurgel SN, Nearing JT, Douglas GM, Langille MGI. Metagenomic functional shifts to plant induced environmental changes. Front Microbiol 2019; 10:1682 [View Article]
    [Google Scholar]
  28. Hasan NA, Young BA, Minard-Smith AT, Saeed K, Li H et al. Microbial community profiling of human saliva using shotgun metagenomic sequencing. PLoS One 2014; 9:e97699 [View Article] [PubMed]
    [Google Scholar]
  29. Ounit R, Lonardi S. Higher classification sensitivity of short metagenomic reads with CLARK-S. Bioinformatics 2016; 32:3823–3825 [View Article] [PubMed]
    [Google Scholar]
  30. Gerlach W, Stoye J. Taxonomic classification of metagenomic shotgun sequences with CARMA3. Nucleic Acids Res 2011; 39:e91 [View Article] [PubMed]
    [Google Scholar]
  31. Segata N, Börnigen D, Morgan XC, Huttenhower C. n.d. PhyloPhlAn is a new method for improved phylogenetic and taxonomic placement of microbes. Nat Commun;4 Epub ahead of print 2013 [View Article]
    [Google Scholar]
  32. Nalbantoglu OU, Way SF, Hinrichs SH, Sayood K. n.d. RAIphy: phylogenetic classification of metagenomics samples using iterative refinement of relative abundance index profiles. BMC Bioinformatics;12 Epub ahead of print 2011 [View Article]
    [Google Scholar]
  33. Mavromatis K, Ivanova N, Barry K, Shapiro H, Goltsman E et al. Use of simulated data sets to evaluate the fidelity of metagenomic processing methods. Nat Methods 2007; 4:495–500
    [Google Scholar]
  34. Comeau AM, Douglas GM, Langille MGI, Eisen J. Microbiome helper: a custom and streamlined workflow for microbiome research. mSystems 2017; 2: [View Article]
    [Google Scholar]
  35. Rinke C, Chuvochina M, Mussig AJ, Chaumeil P-A, Davín AA et al. A standardized archaeal taxonomy for the genome taxonomy database. Nat Microbiol 2021; 6:946–959 [View Article]
    [Google Scholar]
  36. Parks DH, Chuvochina M, Chaumeil P-A, Rinke C, Mussig AJ et al. A complete domain-to-species taxonomy for bacteria and archaea. Nat Biotechnol 2020; 38:1079–1086 [View Article]
    [Google Scholar]
  37. Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat Biotechnol 2019; 37:852–857 [View Article] [PubMed]
    [Google Scholar]
  38. Robeson MS, O’Rourke DR, Kaehler BD, Ziemski M, Dillon MR et al. RESCRIPt: reproducible sequence taxonomy reference database management for the masses. Bioinformatics 2020; 96 [View Article]
    [Google Scholar]
  39. Bokulich NA, Kaehler BD, Rideout JR, Dillon M, Bolyen E et al. Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin. Microbiome 2018; 6:90 [View Article] [PubMed]
    [Google Scholar]
  40. Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res 2013; 41:D590–6 [View Article] [PubMed]
    [Google Scholar]
  41. Breitwieser FP, Baker DN, Salzberg SL. Krakenuniq. Genome Biol 2018; 19:1–32 [View Article]
    [Google Scholar]
  42. Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nat Methods 2015; 12:59–60 [View Article] [PubMed]
    [Google Scholar]
  43. Consortium TU. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res 2021; 49:D480–D489 [View Article] [PubMed]
    [Google Scholar]
  44. Tange O. GNU Parallel 20211022 ('Sinclair.
  45. Simpson EH. Measurement of Diversity. Nature 1949; 163:688 [View Article]
    [Google Scholar]
  46. Shannon CE. A mathematical theory of communication. Bell Syst Tech J 1948; 27:379–423 [View Article]
    [Google Scholar]
  47. Faith DP. Conservation evaluation and phylogenetic diversity. Biological Conservation 1992; 61:1–10 [View Article]
    [Google Scholar]
  48. Chao A. Estimating the population size for capture-recapture data with unequal catchability. Biometrics 1987; 43:783–791 [PubMed]
    [Google Scholar]
  49. McIntosh RP. An index of diversity and the relation of certain concepts to diversity. Ecology 1967; 48:392–404 [View Article]
    [Google Scholar]
  50. Pielou EC. Ecological diversity New York: Wiley; 1975
    [Google Scholar]
  51. Aitchison J. The statistical analysis of compositional data. J Roy Stat Soc Ser B 1982; 44:139–160 [View Article]
    [Google Scholar]
  52. Martino C, Morton JT, Marotz CA, Thompson LR, Tripathi A et al. A novel sparse compositional technique reveals microbial perturbations. mSystems 2019; 4:1–13 [View Article]
    [Google Scholar]
  53. Bray JR, Curtis JT. An ordination of the upland forest communities of Southern Wisconsin. Ecol Monograph 1957; 27:325–349 [View Article]
    [Google Scholar]
  54. Lozupone C, Knight R. UniFrac: a new phylogenetic method for comparing microbial communities. Appl Environ Microbiol 2005; 71:8228–8235 [View Article]
    [Google Scholar]
  55. Huerta-Cepas J, Serra F, Bork P. ETE 3: reconstruction, analysis, and visualization of phylogenomic data. Mol Biol Evol 2016; 33:1635–1638 [View Article]
    [Google Scholar]
  56. Xie Y. knitr: A Comprehensive Tool for Reproducible Research in R. In Stodden V, Leisch F, Peng R. eds Implementing Reproducible Computational Research Chapman and Hall/CRC; 2014
    [Google Scholar]
  57. Xie Y. knitr: A General-Purpose Package for Dynamic Report Generation in R.
  58. Hunter JD. Matplotlib: A 2D graphics environment. Comput Sci Eng 2007; 9:90–95 [View Article]
    [Google Scholar]
  59. McKerns M, Aivazis M. pathos: a framework for heterogeneous computing.
  60. McKerns MM, Strand L, Sullivan T, Fang A, Aivazis MAG. Building a framework for predictive science. In Python in Science Conference Austin: Texas; 2011 [View Article]
    [Google Scholar]
  61. McKinney W. Data structures for statistical computing in python. In Walt S, J M. eds Python in Science Conference Austin: Texas; 2010 [View Article]
    [Google Scholar]
  62. Allaire J, Ushey K, Tang Y, Eddelbuettel D. Reticulate: R interface to python; 2017
  63. Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T et al. SciPy 1.0: fundamental algorithms for scientific computing in python. Nat Methods 2020; 17:261–272 [View Article]
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.000949
Loading
/content/journal/mgen/10.1099/mgen.0.000949
Loading

Data & Media loading...

Supplements

Supplementary material 1

PDF

Supplementary material 2

EXCEL

Supplementary material 3

EXCEL

Supplementary material 4

EXCEL
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error