1887

Abstract

Rapidly assaying the diversity of a bacterial species present in a sample obtained from a hospital patient or an environmental source has become possible after recent technological advances in DNA sequencing. For several applications it is important to accurately identify the presence and estimate relative abundances of the target organisms from short sequence reads obtained from a sample. This task is particularly challenging when the set of interest includes very closely related organisms, such as different strains of pathogenic bacteria, which can vary considerably in terms of virulence, resistance and spread. Using advanced Bayesian statistical modelling and computation techniques we introduce a novel pipeline for bacterial identification that is shown to outperform the currently leading pipeline for this purpose. Our approach enables fast and accurate sequence-based identification of bacterial strains while using only modest computational resources. Hence it provides a useful tool for a wide spectrum of applications, including rapid clinical diagnostics to distinguish among closely related strains causing nosocomial infections. The software implementation is available at https://github.com/PROBIC/BIB.

Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.000075
2016-08-25
2019-10-20
Loading full text...

Full text loading...

/deliver/fulltext/mgen/2/8/mgen000075.html?itemId=/content/journal/mgen/10.1099/mgen.0.000075&mimeType=html&fmt=ahah

References

  1. Darling A. E., Mau B., Perna N. T.. 2010; progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One5:e11147 [CrossRef][PubMed]
    [Google Scholar]
  2. Eyre D. W., Cule M. L., Griffiths D., Crook D. W., Peto T. E., Walker A. S., Wilson D. J.. 2013; Detection of mixed infection from bacterial whole genome sequence data allows assessment of its role in Clostridium difficile transmission. PLoS Comput Biol9:e1003059 [CrossRef][PubMed]
    [Google Scholar]
  3. Feil E. J., Enright M. C.. 2004; Analyses of clonality and the evolution of bacterial pathogens. Curr Opin Microbiol7:308–313 [CrossRef][PubMed]
    [Google Scholar]
  4. Francis O. E., Bendall M., Manimaran S., Hong C., Clement N. L., Castro-Nallar E., Snell Q., Schaalje G. B., Clement M. J. et al. 2013; Pathoscope: species identification and strain attribution with unassembled sequencing data. Genome Res23:1721–1729 [CrossRef][PubMed]
    [Google Scholar]
  5. Franzosa E. A., Hsu T., Sirota-Madi A., Shafquat A., Abu-Ali G., Morgan X. C., Huttenhower C.. 2015; Sequencing and beyond: integrating molecular 'omics' for microbial community profiling. Nat Rev Microbiol13:360–372 [CrossRef][PubMed]
    [Google Scholar]
  6. Glaus P., Honkela A., Rattray M.. 2012; Identifying differentially expressed transcripts from RNA-seq data with biological variation. Bioinformatics28:1721–1728 [CrossRef][PubMed]
    [Google Scholar]
  7. Gogarten J. P., Doolittle W. F., Lawrence J. G.. 2002; Prokaryotic evolution in light of gene transfer. Mol Biol Evol19:2226–2238 [CrossRef][PubMed]
    [Google Scholar]
  8. Harris S. R., Feil E. J., Holden M. T., Quail M. A., Nickerson E. K., Chantratita N., Gardete S., Tavares A., Day N. et al. 2010; Evolution of MRSA during hospital transmission and intercontinental spread. Science327:469–474 [CrossRef][PubMed]
    [Google Scholar]
  9. Hensman J., Rattray M., Lawrence N. D.. 2012; Fast variational inference in the conjugate exponential family. In Advances in Neural Information Processing Systems 25 , pp.2888–2896 Edited by Pereira F., Burges C. J. C., Bottou L., Weinberger K. Q.. La Jolla, CA, USA: Neural Information Processing Systems Foundation;
    [Google Scholar]
  10. Hensman J., Papastamoulis P., Glaus P., Honkela A., Rattray M.. 2015; Fast and accurate approximate inference of transcript expression from RNA-seq data. Bioinformatics31:3881–3889 [CrossRef][PubMed]
    [Google Scholar]
  11. Hong C., Manimaran S., Shen Y., Perez-Rogers J. F., Byrd A. L., Castro-Nallar E., Crandall K. A., Johnson W. E.. 2014; PathoScope 2.0: a complete computational framework for strain identification in environmental or clinical sequencing samples. Microbiome2:33 [CrossRef][PubMed]
    [Google Scholar]
  12. Jiang H., Wong W. H.. 2009; Statistical inferences for isoform expression in RNA-Seq. Bioinformatics25:1026–1032 [CrossRef][PubMed]
    [Google Scholar]
  13. Kanitz A., Gypas F., Gruber A. J., Gruber A. R., Martin G., Zavolan M.. 2015; Comparative assessment of methods for the computational inference of transcript isoform abundance from RNA-seq data. Genome Biol16:150 [CrossRef][PubMed]
    [Google Scholar]
  14. Katz Y., Wang E. T., Airoldi E. M., Burge C. B.. 2010; Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods7:1009–1015 [CrossRef][PubMed]
    [Google Scholar]
  15. Langmead B., Salzberg S. L.. 2012; Fast gapped-read alignment with Bowtie 2. Nat Methods9:357–359 [CrossRef][PubMed]
    [Google Scholar]
  16. Lawrence J. G.. 2002; Gene transfer in bacteria: speciation without species?. Theor Popul Biol61:449–460 [CrossRef][PubMed]
    [Google Scholar]
  17. Li B., Ruotti V., Stewart R. M., Thomson J. A., Dewey C. N.. 2010; RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics26:493–500 [CrossRef][PubMed]
    [Google Scholar]
  18. Li B., Dewey C. N.. 2011; RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics12:323 [CrossRef][PubMed]
    [Google Scholar]
  19. Méric G., Miragaia M., De Been M., Yahara K., Pascoe B., Mageiros L., Mikhail J., Harris L. G., Wilkinson T. S. et al. 2015; Ecological overlap and horizontal gene transfer in Staphylococcus aureus and Staphylococcus epidermidis. Genome Biol Evol7:1313–1328 [CrossRef][PubMed]
    [Google Scholar]
  20. Nariai N., Hirose O., Kojima K., Nagasaki M.. 2013; TIGAR: transcript isoform abundance estimation method with gapped alignment of RNA-Seq data by variational Bayesian inference. Bioinformatics29:2292–2299 [CrossRef][PubMed]
    [Google Scholar]
  21. Nariai N., Kojima K., Mimori T., Sato Y., Kawai Y., Yamaguchi-Kabata Y., Nagasaki M.. 2014; TIGAR2: sensitive and accurate estimation of transcript isoform expression with longer RNA-Seq reads. BMC Genomics15:S5 [CrossRef][PubMed]
    [Google Scholar]
  22. Patro R., Mount S. M., Kingsford C.. 2014; Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms. Nat Biotechnol32:462–464 [CrossRef][PubMed]
    [Google Scholar]
  23. Richter D. C., Ott F., Auch A. F., Schmid R., Huson D. H.. 2008; MetaSim: a sequencing simulator for genomics and metagenomics. PLoS One3:e3373 [CrossRef][PubMed]
    [Google Scholar]
  24. Roberts A., Pachter L.. 2013; Streaming fragment assignment for real-time analysis of sequencing experiments. Nat Methods10:71–73 [CrossRef][PubMed]
    [Google Scholar]
  25. SEQC/MAQC-III Consortium 2014; A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the sequencing quality control consortium. Nat Biotechnol9:903–914
    [Google Scholar]
  26. Segata N., Waldron L., Ballarini A., Narasimhan V., Jousson O., Huttenhower C.. 2012; Metagenomic microbial community profiling using unique clade-specific marker genes. Nat Methods9:811–814 [CrossRef][PubMed]
    [Google Scholar]
  27. Segata N., Boernigen D., Tickle T. L., Morgan X. C., Garrett W. S., Huttenhower C.. 2013; Computational meta'omics for microbial community studies. Mol Syst Biol9:666 [CrossRef][PubMed]
    [Google Scholar]
  28. Shiwa Y., Matsumoto T., Yoshikawa H.. 2013; Identification of laboratory-specific variations of Bacillus subtilis strains used in Japan. Biosci Biotechnol Biochem77:2073–2076 [CrossRef][PubMed]
    [Google Scholar]
  29. Sunagawa S., Mende D. R., Zeller G., Izquierdo-Carrasco F., Berger S. A., Kultima J. R., Coelho L. P., Arumugam M., Tap J. et al. 2013; Metagenomic species profiling using universal phylogenetic marker genes. Nat Methods10:1196–1199 [CrossRef][PubMed]
    [Google Scholar]
  30. Tamura K., Stecher G., Peterson D., Filipski A., Kumar S.. 2013; mega6: molecular evolutionary genetics analysis version 6.0. Mol Biol Evol30:2725–2729 [CrossRef][PubMed]
    [Google Scholar]
  31. Trapnell C., Williams B. A., Pertea G., Mortazavi A., Kwan G., Van Baren M. J., Salzberg S. L., Wold B. J., Pachter L.. 2010; Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol28:516–520 [CrossRef][PubMed]
    [Google Scholar]
  32. Turro E., Astle W. J., Tavaré S.. 2014; Flexible analysis of RNA-seq data using mixed effects models. Bioinformatics30:180–188 [CrossRef][PubMed]
    [Google Scholar]
  33. Ueta M., Iida T., Sakamoto M., Sotozono C., Takahashi J., Kojima K., Okada K., Chen X., Kinoshita S., Honda T.. 2007; Polyclonality of Staphylococcus epidermidis residing on the healthy ocular surface. J Med Microbiol56:77–82 [CrossRef][PubMed]
    [Google Scholar]
  34. Xing Y., Yu T., Wu Y. N., Roy M., Kim J., Lee C.. 2006; An expectation-maximization algorithm for probabilistic reconstructions of full-length isoforms from splice graphs. Nucleic Acids Res34:3150–3160 [CrossRef][PubMed]
    [Google Scholar]
  35. Wellcome Trust Sanger Institute. European Nucleotide Archive ERA118400 2012
  36. Méric, G., Miragaia, M., de Been, M., Yahara, K., Pascoe, B., Mageiros, L., Mikhail, J., Harris, L. G., Wilkinson, T. S., Rolo, J., Lamble, S., Bray, J. E., Jolley, K. A., Hanage, W. P., Bowden, R., Maiden, M. C. J., Mack, D., de Lencastre, H., Feil, E. J., Corander, J., Sheppard, S. K. Data from: Ecological overlap and horizontal gene transfer in Staphylococcus aureus and Staphylococcus epidermidis. Dryad Digital Repository.http://dx.doi.org/10.5061/dryad.82jq4 2015
  37. Shiwa, Y., Matsumoto, T., & Yoshikawa, H. Sequence Read Archive DRR008449 2013
  38. Sankar, A., Malone, B., Bayliss, S., Pascoe, B., Méric, G., Hitchings, M. D., Sheppard, S. K., Feil, E. J., Corander, J., Honkela, A. Benchmarking data for bacterial strain identification. figshare.http://dx.doi.org/10.6084/m9.figshare.1617539 2015
  39. Wellcome Trust Sanger Institute. European Nucleotide Archive ERP000596 2011
  40. Méric, G., Miragaia, M., de Been, M., Yahara, K., Pascoe, B., Mageiros, L., Mikhail, J., Harris, L. G., Wilkinson, T. S., Rolo, J., Lamble, S., Bray, J. E., Jolley, K. A., Hanage, W. P., Bowden, R., Maiden, M. C. J., Mack, D., de Lencastre, H., Feil, E. J., Corander, J., Sheppard, S. K. Sequence Read Archive SRP077215 2016
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.000075
Loading
/content/journal/mgen/10.1099/mgen.0.000075
Loading

Data & Media loading...

Supplements

Supplementary File 1

Supplementary File 2

Most Cited This Month

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error