1887

Abstract

Thanks to the recent advancement of DNA sequencing technology, the cost and time of prokaryotic genome sequencing have been dramatically decreased. It has repeatedly been reported that genome sequencing using high-throughput next-generation sequencing is prone to contaminations due to its high depth of sequencing coverage. Although a few bioinformatics tools are available to detect potential contaminations, these have inherited limitations as they only use protein-coding genes. Here we introduce a new algorithm, called ContEst16S, to detect potential contaminations using 16S rRNA genes from genome assemblies. We screened 69 745 prokaryotic genomes from the NCBI Assembly Database using ContEst16S and found that 594 were contaminated by bacteria, human and plants. Of the predicted contaminated genomes, 8 % were not predicted by the existing protein-coding gene-based tool, implying that both methods can be complementary in the detection of contaminations. A web-based service of the algorithm is available at www.ezbiocloud.net/tools/contest16s.

Loading

Article metrics loading...

/content/journal/ijsem/10.1099/ijsem.0.001872
2017-06-01
2020-01-20
Loading full text...

Full text loading...

/deliver/fulltext/ijsem/67/6/2053.html?itemId=/content/journal/ijsem/10.1099/ijsem.0.001872&mimeType=html&fmt=ahah

References

  1. Chun J, Rainey FA. Integrating genomics into the taxonomy and systematics of the bacteria and archaea. Int J Syst Evol Microbiol 2014;64:316–324 [CrossRef][PubMed]
    [Google Scholar]
  2. Ward N, Fraser CM. How genomics has affected the concept of microbiology. Curr Opin Microbiol 2005;8:564–571 [CrossRef][PubMed]
    [Google Scholar]
  3. Pak TR, Kasarskis A. How next-generation sequencing and multiscale data analysis will transform infectious disease management. Clin Infect Dis 2015;61:1695–1702 [CrossRef][PubMed]
    [Google Scholar]
  4. Alkan C, Sajjadian S, Eichler EE. Limitations of next-generation genome sequence assembly. Nat Methods 2011;8:61–65 [CrossRef][PubMed]
    [Google Scholar]
  5. Gargis AS, Kalman L, Berry MW, Bick DP, Dimmock DP et al. Assuring the quality of next-generation sequencing in clinical laboratory practice. Nat Biotechnol 2012;30:1033–1036 [CrossRef][PubMed]
    [Google Scholar]
  6. Longo MS, O'Neill MJ, O'Neill RJ. Abundant human DNA contamination identified in non-primate genome databases. PLoS One 2011;6:e16410 [CrossRef][PubMed]
    [Google Scholar]
  7. Mukherjee S, Huntemann M, Ivanova N, Kyrpides NC, Pati A. Large-scale contamination of microbial isolate genomes by illumina PhiX control. Stand Genomic Sci 2015;10:18 [CrossRef][PubMed]
    [Google Scholar]
  8. Schmieder R, Edwards R. Fast identification and removal of sequence contamination from genomic and metagenomic datasets. PLoS One 2011;6:e17288 [CrossRef][PubMed]
    [Google Scholar]
  9. Tennessen K, Andersen E, Clingenpeel S, Rinke C, Lundberg DS et al. ProDeGe: a computational protocol for fully automated decontamination of genomes. Isme J 2015;10:269–272 [CrossRef][PubMed]
    [Google Scholar]
  10. Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 2015;25:1043–1055 [CrossRef][PubMed]
    [Google Scholar]
  11. Chun J, Grim CJ, Hasan NA, Lee JH, Choi SY et al. Comparative genomics reveals mechanism for short-term and long-term clonal transitions in pandemic Vibrio cholerae. Proc Natl Acad Sci USA 2009;106:15442–15447 [CrossRef][PubMed]
    [Google Scholar]
  12. Kitahara K, Yasutake Y, Miyazaki K. Mutational robustness of 16S ribosomal RNA, shown by experimental horizontal gene transfer in Escherichia coli. Proc Natl Acad Sci USA 2012;109:19220–19225 [CrossRef][PubMed]
    [Google Scholar]
  13. Nawrocki EP, Eddy SR. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 2013;29:2933–2935 [CrossRef][PubMed]
    [Google Scholar]
  14. Nawrocki EP, Burge SW, Bateman A, Daub J, Eberhardt RY et al. Rfam 12.0: updates to the RNA families database. Nucleic Acids Res 2014;43:D130–D137 [CrossRef][PubMed]
    [Google Scholar]
  15. Myers EW, Miller W. Optimal alignments in linear space. Comput Appl Biosci 1988;4:11–17 [CrossRef][PubMed]
    [Google Scholar]
  16. Yoon SH, Ha SM, Kwon S, Lim J, Kim Y et al. Introducing EzBioCloud: a taxonomically united database of 16S rRNA and whole genome assemblies. Int J Syst Evol Microbiol 2017; in press
    [Google Scholar]
  17. Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 2004;32:1792–1797 [CrossRef][PubMed]
    [Google Scholar]
  18. Stamatakis A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 2006;22:2688–2690 [CrossRef][PubMed]
    [Google Scholar]
  19. Mylvaganam S, Dennis PP. Sequence heterogeneity between the two genes encoding 16S rRNA from the halophilic archaebacterium Haloarcula marismortui. Genetics 1992;130:399–410[PubMed]
    [Google Scholar]
  20. Yap WH, Zhang Z, Wang Y. Distinct types of rRNA operons exist in the genome of the actinomycete Thermomonospora chromogena and evidence for horizontal transfer of an entire rRNA operon. J Bacteriol 1999;181:5201–5209[PubMed]
    [Google Scholar]
  21. Logan NA, De Vos P. Bacillus. In Bergey's Manual of Systematic Bacteriology 2015; pp.1–163
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journal/ijsem/10.1099/ijsem.0.001872
Loading
/content/journal/ijsem/10.1099/ijsem.0.001872
Loading

Data & Media loading...

Supplements

Supplementary File 1

PDF

Most cited articles

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error