1887

Abstract

Culture-independent metagenomic detection of microbial species has the potential to provide rapid and precise real-time diagnostic results. However, it is potentially limited by sequencing and taxonomic classification errors. We use simulated and real-world data to benchmark rates of species misclassification using 100 reference genomes for each of the ten common bloodstream pathogens and six frequent blood-culture contaminants (=1568, only 68 genomes were available for ). Simulating both with and without sequencing error for both the Illumina and Oxford Nanopore platforms, we evaluated commonly used classification tools including Kraken2, Bracken and Centrifuge, utilizing mini (8 GB) and standard (30–50 GB) databases. Bracken with the standard database performed best, the median percentage of reads across both sequencing platforms identified correctly to the species level was 97.8% (IQR 92.7:99.0) [range 5:100]. For Kraken2 with a mini database, a commonly used combination, median species-level identification was 86.4% (IQR 50.5:93.7) [range 4.3:100]. Classification performance varied by species, with being more challenging to classify correctly (probability of reads being assigned to the correct species: 56.1–96.0%, varying by tool used). Human read misclassification was negligible. By filtering out shorter Nanopore reads we found performance similar or superior to Illumina sequencing, despite higher sequencing error rates. Misclassification was more common when the misclassified species had a higher average nucleotide identity to the true species. Our findings highlight taxonomic misclassification of sequencing data occurs and varies by sequencing and analysis workflow. To account for ‘bioinformatic contamination’ we present a contamination catalogue that can be used in metagenomic pipelines to ensure accurate results that can support clinical decision making.

Funding
This study was supported by the:
  • NIHR Oxford Biomedical Research Centre
    • Principle Award Recipient: NotApplicable
  • Rhodes Scholarships
    • Principle Award Recipient: KumerenNadaraj Govender
  • This is an open-access article distributed under the terms of the Creative Commons Attribution License. This article was made open access via a Publish and Read agreement between the Microbiology Society and the corresponding author’s institution.
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.000886
2022-10-21
2024-12-03
Loading full text...

Full text loading...

/deliver/fulltext/mgen/8/10/mgen000886.html?itemId=/content/journal/mgen/10.1099/mgen.0.000886&mimeType=html&fmt=ahah

References

  1. Govender KN, Street TL, Sanderson ND, Eyre DW. Metagenomic sequencing as a pathogen-agnostic clinical diagnostic tool for infectious diseases: a systematic review and meta-analysis of diagnostic test accuracy studies. J Clin Microbiol 2021; 59:e0291620 [View Article]
    [Google Scholar]
  2. Wood DE, Salzberg SL. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol 2014; 15:1–12 [View Article] [PubMed]
    [Google Scholar]
  3. Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biol 2019; 20:1–13 [View Article] [PubMed]
    [Google Scholar]
  4. Lu J, Breitwieser FP, Thielen P, Salzberg SL. Bracken: estimating species abundance in metagenomics data. PeerJ Comput Sci 2017; 3:e104 [View Article]
    [Google Scholar]
  5. Walker MA, Pedamallu CS, Ojesina AI, Bullman S, Sharpe T et al. GATK PathSeq: a customizable computational tool for the discovery and identification of microbial sequences in libraries from eukaryotic hosts. Bioinformatics 2018; 34:4287–4289 [View Article] [PubMed]
    [Google Scholar]
  6. Menzel P, Ng KL, Krogh A. Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nat Commun 2016; 7:1–9 [View Article] [PubMed]
    [Google Scholar]
  7. Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nat Methods 2015; 12:59–60 [View Article] [PubMed]
    [Google Scholar]
  8. Truong DT, Franzosa EA, Tickle TL, Scholz M, Weingart G et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat Methods 2015; 12:902–903 [View Article] [PubMed]
    [Google Scholar]
  9. Milanese A, Mende DR, Paoli L, Salazar G, Ruscheweyh H-J et al. Microbial abundance, activity and population genomic profiling with mOTUs2. Nat Commun 2019; 10:1–11 [View Article] [PubMed]
    [Google Scholar]
  10. Ye SH, Siddle KJ, Park DJ, Sabeti PC. Benchmarking metagenomics tools for taxonomic classification. Cell 2019; 178:779–794 [View Article]
    [Google Scholar]
  11. Mavromatis K, Ivanova N, Barry K, Shapiro H, Goltsman E et al. Use of simulated data sets to evaluate the fidelity of metagenomic processing methods. Nat Methods 2007; 4:495–500 [View Article] [PubMed]
    [Google Scholar]
  12. Bremges A, McHardy AC. Critical assessment of metagenome interpretation enters the second round. mSystems 2018; 3:e00103-18 [View Article]
    [Google Scholar]
  13. Meyer F, Bremges A, Belmann P, Janssen S, McHardy AC et al. Assessing taxonomic metagenome profilers with OPAL. Genome Biol 2019; 20:1–10 [View Article] [PubMed]
    [Google Scholar]
  14. Sczyrba A, Hofmann P, Belmann P, Koslicki D, Janssen S et al. Critical assessment of metagenome interpretation-a benchmark of metagenomics software. Nat Methods 2017; 14:1063–1071 [View Article]
    [Google Scholar]
  15. McIntyre ABR, Ounit R, Afshinnekoo E, Prill RJ, Hénaff E et al. Comprehensive benchmarking and ensemble approaches for metagenomic classifiers. Genome Biol 2017; 18:1–19 [View Article] [PubMed]
    [Google Scholar]
  16. McArdle AJ, Kaforou M. Sensitivity of shotgun metagenomics to host DNA: abundance estimates depend on bioinformatic tools and contamination is the main issue. Access Microbiol 2020; 2:acmi000104 [View Article]
    [Google Scholar]
  17. O’Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res 2016; 44:D733–45 [View Article]
    [Google Scholar]
  18. Karst SM, Ziels RM, Kirkegaard RH, Sørensen EA, McDonald D et al. High-accuracy long-read amplicon sequences using unique molecular identifiers with nanopore or PacBio sequencing. Nat Methods 2021; 18:165–169 [View Article]
    [Google Scholar]
  19. Huang W, Li L, Myers JR, Marth GT. ART: a next-generation sequencing read simulator. Bioinformatics 2012; 28:593–594 [View Article] [PubMed]
    [Google Scholar]
  20. Street TL, Sanderson ND, Kolenda C, Kavanagh J, Pickford H et al. Clinical metagenomic sequencing for species identification and antimicrobial resistance prediction in orthopedic device infection. J Clin Microbiol 2022; 60:e02156–21 [View Article]
    [Google Scholar]
  21. Yang C, Chu J, Warren RL, Birol I. NanoSim: nanopore sequence read simulator based on statistical characterization. Bioinformatics 2017; 6 [View Article]
    [Google Scholar]
  22. Břinda K, Yang C. NanoSim-H (version 1.1.0.4); 2018
  23. Salzberg SL, Wood DE. Releasing the Kraken. Front Bioinform 2021; 75: [View Article]
    [Google Scholar]
  24. Kim D, Song L, Breitwieser FP, Salzberg SL. Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res 2016; 26:1721–1729 [View Article] [PubMed]
    [Google Scholar]
  25. Lipworth S, Vihta K-D, Chau K, Barker L, George S et al. Ten-year longitudinal molecular epidemiology study of Escherichia coli and Klebsiella species bloodstream infections in Oxfordshire, UK. Genome Med 2021; 13:144 [View Article] [PubMed]
    [Google Scholar]
  26. Young BC, Wu C-H, Charlesworth J, Earle S, Price JR et al. Antimicrobial resistance determinants are associated with Staphylococcus aureus bacteraemia and adaptation to the healthcare environment: a bacterial genome-wide association study. Microb Genom 2021; 7:700 [View Article] [PubMed]
    [Google Scholar]
  27. Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun 2018; 9:1–8 [View Article] [PubMed]
    [Google Scholar]
  28. Koenker R, Portnoy S, Ng PT, Zeileis A, Grosjean P et al. n.d. Package ‘ quantreg. Cran R-project org
    [Google Scholar]
  29. Zheng W, Tan TK, Paterson IC, Mutha NVR, Siow CC et al. StreptoBase: an oral Streptococcus mitis group genomic resource and analysis platform. PLoS One 2016; 11:e0151908 [View Article]
    [Google Scholar]
  30. Ehling-Schulz M, Lereclus D, Koehler TM. The Bacillus cereus group: Bacillus species with pathogenic potential. Microbiol Spectr 2019; 7:1128/microbiolspec.GPP3-0032–2018
    [Google Scholar]
  31. Morand PC, Billoet A, Rottman M, Sivadon-Tardy V, Eyrolle L et al. Specific distribution within the Enterobacter cloacae complex of strains isolated from infected orthopedic implants. J Clin Microbiol 2009; 47:2489–2495 [View Article] [PubMed]
    [Google Scholar]
  32. Chiu CY, Miller SA. Clinical metagenomics. Nat Rev Genet 2019; 20:341–355 [View Article] [PubMed]
    [Google Scholar]
  33. Blauwkamp TA, Thair S, Rosen MJ, Blair L, Lindner MS et al. Analytical and clinical validation of a microbial cell-free DNA sequencing test for infectious disease. Nat Microbiol 2019; 4:663–674 [View Article] [PubMed]
    [Google Scholar]
  34. Liang Q, Bible PW, Liu Y, Zou B, Wei L. DeepMicrobes: taxonomic classification for metagenomics with deep learning. NAR Genom Bioinform 2020; 2:lqaa009 [View Article]
    [Google Scholar]
  35. Tedersoo L, Albertsen M, Anslan S, Callahan B. Perspectives and benefits of high-throughput long-read sequencing in microbial ecology. Appl Environ Microbiol 2021; 87:e00626–21 [View Article]
    [Google Scholar]
  36. Amarasinghe SL, Su S, Dong X, Zappia L, Ritchie ME et al. Opportunities and challenges in long-read sequencing data analysis. Genome Biol 2020; 21:1–16 [View Article] [PubMed]
    [Google Scholar]
  37. Wick RR, Judd LM, Holt KE. Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biol 2019; 20:129 [View Article]
    [Google Scholar]
  38. Pearman WS, Freed NE, Silander OK. Testing the advantages and disadvantages of short- and long- read eukaryotic metagenomics using simulated reads. BMC Bioinformatics 2020; 21:1–15 [View Article] [PubMed]
    [Google Scholar]
/content/journal/mgen/10.1099/mgen.0.000886
Loading
/content/journal/mgen/10.1099/mgen.0.000886
Loading

Data & Media loading...

Supplements

Supplementary material 1

PDF

Supplementary material 2

EXCEL
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error