1887

Abstract

This study aimed to provide efficient recognition of bacterial strains on personal computers from MinION (Nanopore) long read data. Thanks to the fall in sequencing costs, the identification of bacteria can now proceed by whole genome sequencing. MinION is a fast, but highly error-prone sequencing device and it is a challenge to successfully identify the strain content of unknown simple or complex microbial samples. It is heavily constrained by memory management and fast access to the read and genome fragments. Our strategy involves three steps: indexing of known genomic sequences for a given or several bacterial species; a request process to assign a read to a strain by matching it to the closest reference genomes; and a final step looking for a minimum set of strains that best explains the observed reads. We have applied our method, called , on 77 strains of . We worked on several genomic distances and obtained a detailed classification of the strains, together with a criterion that allows merging of what we termed ‘sibling’ strains, only separated by a few mutations. Overall, isolated strains can be safely recognized from MinION data. For mixtures of several non-sibling strains, results depend on strain abundance.

  • This is an open-access article distributed under the terms of the Creative Commons Attribution NonCommercial License. This article was made open access via a Publish and Read agreement between the Microbiology Society and the corresponding author’s institution.
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.000654
2021-11-23
2024-04-26
Loading full text...

Full text loading...

/deliver/fulltext/mgen/7/11/mgen000654.html?itemId=/content/journal/mgen/10.1099/mgen.0.000654&mimeType=html&fmt=ahah

References

  1. Stromberg ZR, Van Goor A, Redweik GAJ, Wymore Brand MJ, Wannemuehler MJ et al. Pathogenic and non-pathogenic Escherichia coli colonization and host inflammatory response in a defined microbiota mouse model. Dis Model Mech 2018; 11:11
    [Google Scholar]
  2. Siezen RJ, Starrenburg MJC, Boekhorst J, Renckens B, Molenaar D et al. Genome-scale genotype-phenotype matching of two Lactococcus lactis isolates from plants identifies mechanisms of adaptation to the plant niche. Appl Environ Microbiol 2008; 74:424–436 [View Article]
    [Google Scholar]
  3. Zhang J, Liu M, Xu J, Qi Y, Zhao N et al. First insight into the probiotic properties of ten Streptococcus thermophilus strains based on in vitro conditions. Curr Microbiol 2020; 77:343–352 [View Article] [PubMed]
    [Google Scholar]
  4. Meola M, Rifa E, Shani N, Delbès C, Berthoud H et al. DAIRYdb: a manually curated reference database for improved taxonomy annotation of 16S rRNA gene sequences from dairy products. BMC Genomics 2019; 20:560 [View Article]
    [Google Scholar]
  5. Lesker TR, Durairaj AC, Gálvez EJC, Lagkouvardos I, Baines JF et al. An integrated metagenome catalog reveals new insights into the murine gut microbiome. Cell Rep 2020; 30:2909–2922 [View Article]
    [Google Scholar]
  6. Ma B, France MT, Crabtree J, Holm JB, Humphrys MS et al. A comprehensive non-redundant gene catalog reveals extensive within-community intraspecies diversity in the human vagina. Nat Commun 2020; 11:940 [View Article]
    [Google Scholar]
  7. Pérez-Cobas AE, Gomez-Valero L, Buchrieser C. Metagenomic approaches in microbial ecology: an update on whole-genome and marker gene sequencing analyses. Microb Genom 2020; 6: [View Article]
    [Google Scholar]
  8. Quince C, Walker AW, Simpson JT, Loman NJ, Segata N. Shotgun metagenomics, from sampling to analysis. Nat Biotechnol 2017; 35:833–844 [View Article] [PubMed]
    [Google Scholar]
  9. Latorre-Pérez A, Villalba-Bermell P, Pascual J, Vilanova C. Assembly methods for nanopore-based metagenomic sequencing: a comparative study. Sci Rep 2020; 10:13588 [View Article]
    [Google Scholar]
  10. Wick RR, Judd LM, Gorrie CL, Holt KE. Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads. PLOS Comput Biol [Internet] 2017; 13:1–22
    [Google Scholar]
  11. Anyansi C, Straub TJ, Manson AL, Earl AM, Abeel T. Computational methods for strain-level microbial detection in colony and metagenome sequencing data. Front Microbiol 2020; 11:1925 [View Article]
    [Google Scholar]
  12. Balloux F, Brynildsrud OB, van DL, Shaw LP, Chen H et al. From Theory to Practice: Translating Whole-Genome Sequencing (WGS) into the Clinic. Trends in Microbiology 2018; 26:1035–1048 [View Article]
    [Google Scholar]
  13. Singh GB. Alignment tools. In Fundamentals of Bioinformatics and Computational Biology Cham: Springer International Publishing; 2015 pp 159–170
    [Google Scholar]
  14. Daniel HH, Alexander FA, Ji Q, Stephan CS. MEGAN analysis of metagenomic data. Genome Res 2007; 17:377–386
    [Google Scholar]
  15. Wilke A, Bischof J, Gerlach W, Glass E, Harrison T et al. The mg-rast metagenomics database and portal in 2015. Nucleic Acids Res 2015; 44:4
    [Google Scholar]
  16. Scholz M, Ward DV, Pasolli E, Tolio T, Zolfo M et al. Strain-level microbial epidemiology and population genomics from shotgun metagenomics. Nat Methods 2016; 13:435–438 [View Article]
    [Google Scholar]
  17. Kim D, Song L, Breitwieser FP, Salzberg SL. Centrifuge: Rapid and sensitive classification of metagenomic sequences. Genome Res 2016; 26:1721–1729 [View Article]
    [Google Scholar]
  18. Menzel P, Ng KL, Krogh A. Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nat Commun 2016; 7:1–9 [View Article]
    [Google Scholar]
  19. Wood DE, Salzberg SL. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol 2014; 15:R46 [View Article]
    [Google Scholar]
  20. Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biol 2019; 20:257 [View Article] [PubMed]
    [Google Scholar]
  21. Ounit R, Wanamaker S, Close TJ, Lonardi S. CLARK: Fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genomics 2015; 16:236 [View Article]
    [Google Scholar]
  22. Roosaare M, Vaher M, Kaplinski L, Möls M, Andreson R et al. Strainseeker: Fast identification of bacterial strains from raw sequencing reads using user-provided guide trees. PeerJ 2017; 5:e3353 [View Article]
    [Google Scholar]
  23. Meyer F, Bremges A, Belmann P, Janssen S, McHardy AC et al. Assessing taxonomic metagenome profilers with OPAL. Genome Biol 2019; 20:51 [View Article]
    [Google Scholar]
  24. Pollard MO, Gurdasani D, Mentzer AJ, Porter T, Sandhu MS. Long reads: their purpose and place. Hum Mol Genet 2018; 27:R234–41 [View Article]
    [Google Scholar]
  25. Vandervalk BP, Yang C, Xue Z, Raghavan K, Chu J et al. Konnector v2.0: Pseudo-long reads from paired-end sequencing data. BMC Med Genomics 2015; 8:S1 [View Article]
    [Google Scholar]
  26. Olm MR, Crits-Christoph A, Bouma-Gregson K, Firek BA, Morowitz MJ et al. inStrain profiles population microdiversity from metagenomic data and sensitively detects shared microbial strains. Nat Biotechnol 2021; 39:727–736 [View Article]
    [Google Scholar]
  27. Roberts M, Hayes W, Hunt BR, Mount SM, Yorke JA. Reducing storage requirements for biological sequence comparison. Bioinformatics 2004; 20:3363–3369 [View Article]
    [Google Scholar]
  28. Dilthey AT, Jain C, Koren S, Phillippy AM. Strain-level metagenomic assignment and compositional estimation for long reads with MetaMaps. Nat Commun 2019; 10:3066 [View Article]
    [Google Scholar]
  29. Martinović A, Cocuzzi R, Arioli S, Mora D. Streptococcus thermophilus: To survive, or not to survive the gastrointestinal tract, that is the question!. Nutrients 2020; 12:E2175 [View Article]
    [Google Scholar]
  30. Alexandraki V, Kazou M, Blom J, Pot B, Papadimitriou K et al. Comparative genomics of Streptococcus thermophilus support important traits concerning the evolution, biology and technological properties of the species. Front Microbiol 2019; 10:2916 [View Article]
    [Google Scholar]
  31. Junjua M, Kechaou N, Chain F, Awussi AA, Roussel Y et al. A large scale in vitro screening of Streptococcus thermophilus strains revealed strains with a high anti-inflammatory potential. LWT 2016; 70:78–87 [View Article]
    [Google Scholar]
  32. Terzaghi BE, Sandine WE. Improved medium for lactic Streptococci and their bacteriophages. Appl Microbiol 1975; 29:807–813 [View Article]
    [Google Scholar]
  33. De Man JC, Rogosa M, Sharpe ME. A medium for the cultivation of Lactobacilli. J Appl Bacteriol 1960; 23:130–135 [View Article]
    [Google Scholar]
  34. Leimeister C-A, Boden M, Horwege S, Lindner S, Morgenstern B. Fast alignment-free sequence comparison using spaced-word frequencies. Bioinformatics 2014; 30:1991–1999 [View Article]
    [Google Scholar]
  35. Břinda K, Sykulski M, Kucherov G. Spaced seeds improve k-mer-based metagenomic classification. Bioinformatics 2015; 31:3584–3592 [View Article]
    [Google Scholar]
  36. Kucherov G, Noé L, Roytberg M. A unifying framework for seed sensitivity and its application to subset seeds. J Bioinform Comput Biol 200421 [View Article]
    [Google Scholar]
  37. Noé L. Best hits of 11110110111: Model-free selection and parameter-free sensitivity calculation of spaced seeds. Algorithms Mol Biol 2017; 12:1 [View Article]
    [Google Scholar]
  38. Crainiceanu A, Lemire D. Bloofi: Multidimensional Bloom filters. Information Systems 2015; 54:311–324 [View Article]
    [Google Scholar]
  39. Harris RS, Medvedev P. Improved representation of sequence bloom trees. Bioinformatics 2020; 36:721–727 [View Article]
    [Google Scholar]
  40. Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun 2018; 9:5114 [View Article]
    [Google Scholar]
  41. Vallenet D, Calteau A, Dubois M, Amours P, Bazin A et al. Microscope: An integrated platform for the annotation and exploration of microbial gene functions through genomic, pangenomic and metabolic comparative analysis. Nucleic Acids Res 2019; 48:D579–89
    [Google Scholar]
  42. Bocs S, Cruveiller S, Vallenet D, Nuel G, Médigue C. AMIGene: Annotation of MIcrobial Genes. Nucleic Acids Research 2003; 31:3723–3726 [View Article]
    [Google Scholar]
  43. Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH et al. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol 2016; 17:132 [View Article]
    [Google Scholar]
  44. Ignatov DI. Introduction to formal concept analysis and its applications in information retrieval and related fields. In Russian Summer School in Information Retrieval Springer; 2014 pp 42–141
    [Google Scholar]
  45. Letunic I, Bork P. Interactive Tree Of Life (iTOL) v4: recent updates and new developments. Nucleic Acids Res 2019; 47:W256–9 [View Article]
    [Google Scholar]
  46. Marcelino VR, Holmes EC, Sorrell TC. The use of taxon-specific reference databases compromises metagenomic classification. BMC Genomics 2020; 21:1–5 [View Article]
    [Google Scholar]
  47. Gebser M, Kaminski R, Kaufmann B, Schaub T. Answer set solving in practice. Synth Lect Artif Intell Mach Learn 2012; 6:1–238
    [Google Scholar]
  48. Van Rossum T, Ferretti P, Maistrenko OM, Bork P. Diversity within species: interpreting strains in microbiomes. Nat Rev Microbiol 2020; 18:491–506
    [Google Scholar]
  49. Gori A, Harrison OB, Mlia E, Nishihara Y, Chan JM et al. Pan-gwas of streptococcus agalactiae highlights lineage-specific genes associated with virulence and niche adaptation. mBio 2020; 11:e00728-20 [View Article]
    [Google Scholar]
  50. Břinda K, Callendrello A, Ma KC, MacFadden DR, Charalampous T et al. Rapid inference of antibiotic resistance and susceptibility by genomic neighbour typing. Nat Microbiol 2020; 5:455–464
    [Google Scholar]
  51. Greig DR, Dallman TJ, Hopkins KL, Jenkins C. Minion Nanopore sequencing identifies the position and structure of bacterial antibiotic resistance determinants in a multidrug-resistant strain of enteroaggregative Escherichia coli. Microb Genom 2018; 4:e000213 [View Article]
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.000654
Loading
/content/journal/mgen/10.1099/mgen.0.000654
Loading

Data & Media loading...

Supplements

Supplementary material 1

PDF
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error