1887

Abstract

Rapidly assaying the diversity of a bacterial species present in a sample obtained from a hospital patient or an environmental source has become possible after recent technological advances in DNA sequencing. For several applications it is important to accurately identify the presence and estimate relative abundances of the target organisms from short sequence reads obtained from a sample. This task is particularly challenging when the set of interest includes very closely related organisms, such as different strains of pathogenic bacteria, which can vary considerably in terms of virulence, resistance and spread. Using advanced Bayesian statistical modelling and computation techniques we introduce a novel pipeline for bacterial identification that is shown to outperform the currently leading pipeline for this purpose. Our approach enables fast and accurate sequence-based identification of bacterial strains while using only modest computational resources. Hence it provides a useful tool for a wide spectrum of applications, including rapid clinical diagnostics to distinguish among closely related strains causing nosocomial infections. The software implementation is available at https://github.com/PROBIC/BIB.

Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.000075
2016-08-25
2024-03-29
Loading full text...

Full text loading...

/deliver/fulltext/mgen/2/8/mgen000075.html?itemId=/content/journal/mgen/10.1099/mgen.0.000075&mimeType=html&fmt=ahah

References

  1. Darling A. E., Mau B., Perna N. T. 2010; progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5:e11147 [View Article][PubMed]
    [Google Scholar]
  2. Eyre D. W., Cule M. L., Griffiths D., Crook D. W., Peto T. E., Walker A. S., Wilson D. J. 2013; Detection of mixed infection from bacterial whole genome sequence data allows assessment of its role in Clostridium difficile transmission. PLoS Comput Biol 9:e1003059 [View Article][PubMed]
    [Google Scholar]
  3. Feil E. J., Enright M. C. 2004; Analyses of clonality and the evolution of bacterial pathogens. Curr Opin Microbiol 7:308–313 [View Article][PubMed]
    [Google Scholar]
  4. Francis O. E., Bendall M., Manimaran S., Hong C., Clement N. L., Castro-Nallar E., Snell Q., Schaalje G. B., Clement M. J. et al. 2013; Pathoscope: species identification and strain attribution with unassembled sequencing data. Genome Res 23:1721–1729 [View Article][PubMed]
    [Google Scholar]
  5. Franzosa E. A., Hsu T., Sirota-Madi A., Shafquat A., Abu-Ali G., Morgan X. C., Huttenhower C. 2015; Sequencing and beyond: integrating molecular 'omics' for microbial community profiling. Nat Rev Microbiol 13:360–372 [View Article][PubMed]
    [Google Scholar]
  6. Glaus P., Honkela A., Rattray M. 2012; Identifying differentially expressed transcripts from RNA-seq data with biological variation. Bioinformatics 28:1721–1728 [View Article][PubMed]
    [Google Scholar]
  7. Gogarten J. P., Doolittle W. F., Lawrence J. G. 2002; Prokaryotic evolution in light of gene transfer. Mol Biol Evol 19:2226–2238 [View Article][PubMed]
    [Google Scholar]
  8. Harris S. R., Feil E. J., Holden M. T., Quail M. A., Nickerson E. K., Chantratita N., Gardete S., Tavares A., Day N. et al. 2010; Evolution of MRSA during hospital transmission and intercontinental spread. Science 327:469–474 [View Article][PubMed]
    [Google Scholar]
  9. Hensman J., Rattray M., Lawrence N. D. 2012; Fast variational inference in the conjugate exponential family. In Advances in Neural Information Processing Systems 25 , pp. 2888–2896 Edited by Pereira F., Burges C. J. C., Bottou L., Weinberger K. Q. La Jolla, CA, USA: Neural Information Processing Systems Foundation;
    [Google Scholar]
  10. Hensman J., Papastamoulis P., Glaus P., Honkela A., Rattray M. 2015; Fast and accurate approximate inference of transcript expression from RNA-seq data. Bioinformatics 31:3881–3889 [View Article][PubMed]
    [Google Scholar]
  11. Hong C., Manimaran S., Shen Y., Perez-Rogers J. F., Byrd A. L., Castro-Nallar E., Crandall K. A., Johnson W. E. 2014; PathoScope 2.0: a complete computational framework for strain identification in environmental or clinical sequencing samples. Microbiome 2:33 [View Article][PubMed]
    [Google Scholar]
  12. Jiang H., Wong W. H. 2009; Statistical inferences for isoform expression in RNA-Seq. Bioinformatics 25:1026–1032 [View Article][PubMed]
    [Google Scholar]
  13. Kanitz A., Gypas F., Gruber A. J., Gruber A. R., Martin G., Zavolan M. 2015; Comparative assessment of methods for the computational inference of transcript isoform abundance from RNA-seq data. Genome Biol 16:150 [View Article][PubMed]
    [Google Scholar]
  14. Katz Y., Wang E. T., Airoldi E. M., Burge C. B. 2010; Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods 7:1009–1015 [View Article][PubMed]
    [Google Scholar]
  15. Langmead B., Salzberg S. L. 2012; Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359 [View Article][PubMed]
    [Google Scholar]
  16. Lawrence J. G. 2002; Gene transfer in bacteria: speciation without species?. Theor Popul Biol 61:449–460 [View Article][PubMed]
    [Google Scholar]
  17. Li B., Ruotti V., Stewart R. M., Thomson J. A., Dewey C. N. 2010; RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics 26:493–500 [View Article][PubMed]
    [Google Scholar]
  18. Li B., Dewey C. N. 2011; RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12:323 [View Article][PubMed]
    [Google Scholar]
  19. Méric G., Miragaia M., De Been M., Yahara K., Pascoe B., Mageiros L., Mikhail J., Harris L. G., Wilkinson T. S. et al. 2015; Ecological overlap and horizontal gene transfer in Staphylococcus aureus and Staphylococcus epidermidis. Genome Biol Evol 7:1313–1328 [View Article][PubMed]
    [Google Scholar]
  20. Nariai N., Hirose O., Kojima K., Nagasaki M. 2013; TIGAR: transcript isoform abundance estimation method with gapped alignment of RNA-Seq data by variational Bayesian inference. Bioinformatics 29:2292–2299 [View Article][PubMed]
    [Google Scholar]
  21. Nariai N., Kojima K., Mimori T., Sato Y., Kawai Y., Yamaguchi-Kabata Y., Nagasaki M. 2014; TIGAR2: sensitive and accurate estimation of transcript isoform expression with longer RNA-Seq reads. BMC Genomics 15:S5 [View Article][PubMed]
    [Google Scholar]
  22. Patro R., Mount S. M., Kingsford C. 2014; Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms. Nat Biotechnol 32:462–464 [View Article][PubMed]
    [Google Scholar]
  23. Richter D. C., Ott F., Auch A. F., Schmid R., Huson D. H. 2008; MetaSim: a sequencing simulator for genomics and metagenomics. PLoS One 3:e3373 [View Article][PubMed]
    [Google Scholar]
  24. Roberts A., Pachter L. 2013; Streaming fragment assignment for real-time analysis of sequencing experiments. Nat Methods 10:71–73 [View Article][PubMed]
    [Google Scholar]
  25. SEQC/MAQC-III Consortium 2014; A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the sequencing quality control consortium. Nat Biotechnol 9:903–914
    [Google Scholar]
  26. Segata N., Waldron L., Ballarini A., Narasimhan V., Jousson O., Huttenhower C. 2012; Metagenomic microbial community profiling using unique clade-specific marker genes. Nat Methods 9:811–814 [View Article][PubMed]
    [Google Scholar]
  27. Segata N., Boernigen D., Tickle T. L., Morgan X. C., Garrett W. S., Huttenhower C. 2013; Computational meta'omics for microbial community studies. Mol Syst Biol 9:666 [View Article][PubMed]
    [Google Scholar]
  28. Shiwa Y., Matsumoto T., Yoshikawa H. 2013; Identification of laboratory-specific variations of Bacillus subtilis strains used in Japan. Biosci Biotechnol Biochem 77:2073–2076 [View Article][PubMed]
    [Google Scholar]
  29. Sunagawa S., Mende D. R., Zeller G., Izquierdo-Carrasco F., Berger S. A., Kultima J. R., Coelho L. P., Arumugam M., Tap J. et al. 2013; Metagenomic species profiling using universal phylogenetic marker genes. Nat Methods 10:1196–1199 [View Article][PubMed]
    [Google Scholar]
  30. Tamura K., Stecher G., Peterson D., Filipski A., Kumar S. 2013; mega6: molecular evolutionary genetics analysis version 6.0. Mol Biol Evol 30:2725–2729 [View Article][PubMed]
    [Google Scholar]
  31. Trapnell C., Williams B. A., Pertea G., Mortazavi A., Kwan G., Van Baren M. J., Salzberg S. L., Wold B. J., Pachter L. 2010; Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28:516–520 [View Article][PubMed]
    [Google Scholar]
  32. Turro E., Astle W. J., Tavaré S. 2014; Flexible analysis of RNA-seq data using mixed effects models. Bioinformatics 30:180–188 [View Article][PubMed]
    [Google Scholar]
  33. Ueta M., Iida T., Sakamoto M., Sotozono C., Takahashi J., Kojima K., Okada K., Chen X., Kinoshita S., Honda T. 2007; Polyclonality of Staphylococcus epidermidis residing on the healthy ocular surface. J Med Microbiol 56:77–82 [View Article][PubMed]
    [Google Scholar]
  34. Xing Y., Yu T., Wu Y. N., Roy M., Kim J., Lee C. 2006; An expectation-maximization algorithm for probabilistic reconstructions of full-length isoforms from splice graphs. Nucleic Acids Res 34:3150–3160 [View Article][PubMed]
    [Google Scholar]
  35. Wellcome Trust Sanger Institute. European Nucleotide Archive ERA118400 2012
  36. Méric, G., Miragaia, M., de Been, M., Yahara, K., Pascoe, B., Mageiros, L., Mikhail, J., Harris, L. G., Wilkinson, T. S., Rolo, J., Lamble, S., Bray, J. E., Jolley, K. A., Hanage, W. P., Bowden, R., Maiden, M. C. J., Mack, D., de Lencastre, H., Feil, E. J., Corander, J., Sheppard, S. K. Data from: Ecological overlap and horizontal gene transfer in Staphylococcus aureus and Staphylococcus epidermidis. Dryad Digital Repository. http://dx.doi.org/10.5061/dryad.82jq4 2015
  37. Shiwa, Y., Matsumoto, T., & Yoshikawa, H. Sequence Read Archive DRR008449 2013
  38. Sankar, A., Malone, B., Bayliss, S., Pascoe, B., Méric, G., Hitchings, M. D., Sheppard, S. K., Feil, E. J., Corander, J., Honkela, A. Benchmarking data for bacterial strain identification. figshare. http://dx.doi.org/10.6084/m9.figshare.1617539 2015
  39. Wellcome Trust Sanger Institute. European Nucleotide Archive ERP000596 2011
  40. Méric, G., Miragaia, M., de Been, M., Yahara, K., Pascoe, B., Mageiros, L., Mikhail, J., Harris, L. G., Wilkinson, T. S., Rolo, J., Lamble, S., Bray, J. E., Jolley, K. A., Hanage, W. P., Bowden, R., Maiden, M. C. J., Mack, D., de Lencastre, H., Feil, E. J., Corander, J., Sheppard, S. K. Sequence Read Archive SRP077215 2016
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.000075
Loading
/content/journal/mgen/10.1099/mgen.0.000075
Loading

Data & Media loading...

Supplements

Supplementary File 1

Supplementary File 2

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error