1887

Abstract

Thanks to the recent advancement of DNA sequencing technology, the cost and time of prokaryotic genome sequencing have been dramatically decreased. It has repeatedly been reported that genome sequencing using high-throughput next-generation sequencing is prone to contaminations due to its high depth of sequencing coverage. Although a few bioinformatics tools are available to detect potential contaminations, these have inherited limitations as they only use protein-coding genes. Here we introduce a new algorithm, called ContEst16S, to detect potential contaminations using 16S rRNA genes from genome assemblies. We screened 69 745 prokaryotic genomes from the NCBI Assembly Database using ContEst16S and found that 594 were contaminated by bacteria, human and plants. Of the predicted contaminated genomes, 8 % were not predicted by the existing protein-coding gene-based tool, implying that both methods can be complementary in the detection of contaminations. A web-based service of the algorithm is available at www.ezbiocloud.net/tools/contest16s.

Loading

Article metrics loading...

/content/journal/ijsem/10.1099/ijsem.0.001872
2017-06-01
2024-03-28
Loading full text...

Full text loading...

/deliver/fulltext/ijsem/67/6/2053.html?itemId=/content/journal/ijsem/10.1099/ijsem.0.001872&mimeType=html&fmt=ahah

References

  1. Chun J, Rainey FA. Integrating genomics into the taxonomy and systematics of the bacteria and archaea. Int J Syst Evol Microbiol 2014; 64:316–324 [View Article][PubMed]
    [Google Scholar]
  2. Ward N, Fraser CM. How genomics has affected the concept of microbiology. Curr Opin Microbiol 2005; 8:564–571 [View Article][PubMed]
    [Google Scholar]
  3. Pak TR, Kasarskis A. How next-generation sequencing and multiscale data analysis will transform infectious disease management. Clin Infect Dis 2015; 61:1695–1702 [View Article][PubMed]
    [Google Scholar]
  4. Alkan C, Sajjadian S, Eichler EE. Limitations of next-generation genome sequence assembly. Nat Methods 2011; 8:61–65 [View Article][PubMed]
    [Google Scholar]
  5. Gargis AS, Kalman L, Berry MW, Bick DP, Dimmock DP et al. Assuring the quality of next-generation sequencing in clinical laboratory practice. Nat Biotechnol 2012; 30:1033–1036 [View Article][PubMed]
    [Google Scholar]
  6. Longo MS, O'Neill MJ, O'Neill RJ. Abundant human DNA contamination identified in non-primate genome databases. PLoS One 2011; 6:e16410 [View Article][PubMed]
    [Google Scholar]
  7. Mukherjee S, Huntemann M, Ivanova N, Kyrpides NC, Pati A. Large-scale contamination of microbial isolate genomes by illumina PhiX control. Stand Genomic Sci 2015; 10:18 [View Article][PubMed]
    [Google Scholar]
  8. Schmieder R, Edwards R. Fast identification and removal of sequence contamination from genomic and metagenomic datasets. PLoS One 2011; 6:e17288 [View Article][PubMed]
    [Google Scholar]
  9. Tennessen K, Andersen E, Clingenpeel S, Rinke C, Lundberg DS et al. ProDeGe: a computational protocol for fully automated decontamination of genomes. Isme J 2015; 10:269–272 [View Article][PubMed]
    [Google Scholar]
  10. Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 2015; 25:1043–1055 [View Article][PubMed]
    [Google Scholar]
  11. Chun J, Grim CJ, Hasan NA, Lee JH, Choi SY et al. Comparative genomics reveals mechanism for short-term and long-term clonal transitions in pandemic Vibrio cholerae. Proc Natl Acad Sci USA 2009; 106:15442–15447 [View Article][PubMed]
    [Google Scholar]
  12. Kitahara K, Yasutake Y, Miyazaki K. Mutational robustness of 16S ribosomal RNA, shown by experimental horizontal gene transfer in Escherichia coli. Proc Natl Acad Sci USA 2012; 109:19220–19225 [View Article][PubMed]
    [Google Scholar]
  13. Nawrocki EP, Eddy SR. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 2013; 29:2933–2935 [View Article][PubMed]
    [Google Scholar]
  14. Nawrocki EP, Burge SW, Bateman A, Daub J, Eberhardt RY et al. Rfam 12.0: updates to the RNA families database. Nucleic Acids Res 2014; 43:D130–D137 [View Article][PubMed]
    [Google Scholar]
  15. Myers EW, Miller W. Optimal alignments in linear space. Comput Appl Biosci 1988; 4:11–17 [View Article][PubMed]
    [Google Scholar]
  16. Yoon SH, Ha SM, Kwon S, Lim J, Kim Y et al. Introducing EzBioCloud: a taxonomically united database of 16S rRNA and whole genome assemblies. Int J Syst Evol Microbiol 2017 in press
    [Google Scholar]
  17. Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 2004; 32:1792–1797 [View Article][PubMed]
    [Google Scholar]
  18. Stamatakis A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 2006; 22:2688–2690 [View Article][PubMed]
    [Google Scholar]
  19. Mylvaganam S, Dennis PP. Sequence heterogeneity between the two genes encoding 16S rRNA from the halophilic archaebacterium Haloarcula marismortui. Genetics 1992; 130:399–410[PubMed]
    [Google Scholar]
  20. Yap WH, Zhang Z, Wang Y. Distinct types of rRNA operons exist in the genome of the actinomycete Thermomonospora chromogena and evidence for horizontal transfer of an entire rRNA operon. J Bacteriol 1999; 181:5201–5209[PubMed]
    [Google Scholar]
  21. Logan NA, De Vos P. Bacillus. In Bergey's Manual of Systematic Bacteriology 2015 pp. 1–163
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journal/ijsem/10.1099/ijsem.0.001872
Loading
/content/journal/ijsem/10.1099/ijsem.0.001872
Loading

Data & Media loading...

Supplements

Supplementary File 1

PDF
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error