1887

Abstract

Genomics has set the basis for a variety of methodologies that produce high-throughput datasets identifying the different players that define gene regulation, particularly regulation of transcription initiation and operon organization. These datasets are available in public repositories, such as the Gene Expression Omnibus, or ArrayExpress. However, accessing and navigating such a wealth of data is not straightforward. No resource currently exists that offers all available high and low-throughput data on transcriptional regulation in K-12 to easily use both as whole datasets, or as individual interactions and regulatory elements. RegulonDB (https://regulondb.ccg.unam.mx) began gathering high-throughput dataset collections in 2009, starting with transcription start sites, then adding ChIP-seq and gSELEX in 2012, with up to 99 different experimental high-throughput datasets available in 2019. In this paper we present a radical upgrade to more than 2000 high-throughput datasets, processed to facilitate their comparison, introducing up-to-date collections of transcription termination sites, transcription units, as well as transcription factor binding interactions derived from ChIP-seq, ChIP-exo, gSELEX and DAP-seq experiments, besides expression profiles derived from RNA-seq experiments. For ChIP-seq experiments we offer both the data as presented by the authors, as well as data uniformly processed in-house, enhancing their comparability, as well as the traceability of the methods and reproducibility of the results. Furthermore, we have expanded the tools available for browsing and visualization across and within datasets. We include comparisons against previously existing knowledge in RegulonDB from classic experiments, a nucleotide-resolution genome viewer, and an interface that enables users to browse datasets by querying their metadata. A particular effort was made to automatically extract detailed experimental growth conditions by implementing an assisted curation strategy applying Natural language processing and machine learning. We provide summaries with the total number of interactions found in each experiment, as well as tools to identify common results among different experiments. This is a long-awaited resource to make use of such wealth of knowledge and advance our understanding of the biology of the model bacterium K-12.

Funding
This study was supported by the:
  • DGAPA-UNAM (Award 369220)
    • Principle Award Recipient: EstefaniGaytan-Nuñez
  • DGAPA-UNAM (Award 18182)
    • Principle Award Recipient: EstefaniGaytan-Nuñez
  • CONACyT (Award 929687)
    • Principle Award Recipient: ClaireRioualen
  • Canadian Network for Research and Innovation in Machining Technology, Natural Sciences and Engineering Research Council of Canada
    • Principle Award Recipient: GabrielMoreno-Hagelsieb
  • National Institute of General Medical Sciences (Award 5RO1GM131643)
    • Principle Award Recipient: JulioCollado-Vides
  • UNAM-PAPIIT (Award IA203420)
    • Principle Award Recipient: Carlos-FranciscoMéndez-Cruz
  • DGAPA-UNAM (Award Postdoctoral Fellowship)
    • Principle Award Recipient: LaraPaloma
  • This is an open-access article distributed under the terms of the Creative Commons Attribution License.
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.000833
2022-05-18
2024-12-13
Loading full text...

Full text loading...

/deliver/fulltext/mgen/8/5/mgen000833.html?itemId=/content/journal/mgen/10.1099/mgen.0.000833&mimeType=html&fmt=ahah

References

  1. Rioualen C, Charbonnier-Khamvongsa L, Collado-Vides J, van Helden J. Integrating bacterial ChIP-seq and RNA-seq data with snakechunks. Curr Protoc Bioinformatics 2019; 66:e72 [View Article] [PubMed]
    [Google Scholar]
  2. Mejía-Almonte C, Busby SJW, Wade JT, van Helden J, Arkin AP et al. Redefining fundamental concepts of transcription initiation in bacteria. Nat Rev Genet 2020; 21:699–714 [View Article] [PubMed]
    [Google Scholar]
  3. Santos-Zavaleta A, Salgado H, Gama-Castro S, Sánchez-Pérez M, Gómez-Romero L et al. RegulonDB v 10.5: tackling challenges to unify classic and high throughput knowledge of gene regulation in E. coli K-12. Nucleic Acids Res 2019; 47:D212–D220 [View Article] [PubMed]
    [Google Scholar]
  4. Keseler IM, Gama-Castro S, Mackie A, Billington R, Bonavides-Martínez C et al. The EcoCyc Database in 2021. Front Microbiol 2021; 12:711077 [View Article] [PubMed]
    [Google Scholar]
  5. Johnson DS, Mortazavi A, Myers RM, Wold B. Genome-wide mapping of in vivo protein-DNA interactions. Science 2007; 316:1497–1502 [View Article] [PubMed]
    [Google Scholar]
  6. Robertson G, Hirst M, Bainbridge M, Bilenky M, Zhao Y et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat Methods 2007; 4:651–657 [View Article] [PubMed]
    [Google Scholar]
  7. Seo SW, Kim D, Latif H, O’Brien EJ, Szubin R et al. Deciphering Fur transcriptional regulatory network highlights its complex role beyond iron metabolism in Escherichia coli . Nat Commun 2014; 5:4910 [View Article] [PubMed]
    [Google Scholar]
  8. O’Malley RC, Huang S-SC, Song L, Lewsey MG, Bartlett A et al. Cistrome and epicistrome features shape the regulatory DNA landscape. Cell 2016; 165:1280–1292 [View Article] [PubMed]
    [Google Scholar]
  9. Shimada T, Ogasawara H, Ishihama A. Genomic SELEX screening of regulatory targets of Escherichia coli transcription factors. Methods Mol Biol 2018; 1837:49–69 [View Article] [PubMed]
    [Google Scholar]
  10. Salgado H, Peralta-Gil M, Gama-Castro S, Santos-Zavaleta A, Muñiz-Rascado L et al. RegulonDB v8.0: omics data sets, evolutionary conservation, regulatory phrases, cross-validated gold standards and more. Nucleic Acids Res 2013; 41:D203–13 [View Article] [PubMed]
    [Google Scholar]
  11. Conway T, Creecy JP, Maddox SM, Grissom JE, Conkle TL et al. Unprecedented high-resolution view of bacterial operon architecture revealed by RNA sequencing. mBio 2014; 5:e01442–14 [View Article] [PubMed]
    [Google Scholar]
  12. Ettwiller L, Buswell J, Yigit E, Schildkraut I. A novel enrichment strategy reveals unprecedented number of novel transcription start sites at single base resolution in a model prokaryote and the gut microbiome. BMC Genomics 2016; 17:199 [View Article] [PubMed]
    [Google Scholar]
  13. Yan B, Boitano M, Clark TA, Ettwiller L. SMRT-Cappable-seq reveals complex operon variants in bacteria. Nat Commun 2018; 9:3676 [View Article] [PubMed]
    [Google Scholar]
  14. Ju X, Li D, Liu S. Full-length RNA profiling reveals pervasive bidirectional transcription terminators in bacteria. Nat Microbiol 2019; 4:1907–1918 [View Article] [PubMed]
    [Google Scholar]
  15. Kurata T, Katayama A, Hiramatsu M, Kiguchi Y, Takeuchi M et al. Identification of the set of genes, including nonannotated morA, under the direct control of ModE in Escherichia coli . J Bacteriol 2013; 195:4496–4505 [View Article] [PubMed]
    [Google Scholar]
  16. Shimada T, Kori A, Ishihama A. Involvement of the ribose operon repressor RbsR in regulation of purine nucleotide synthesis in Escherichia coli . FEMS Microbiol Lett 2013; 344:159–165 [View Article] [PubMed]
    [Google Scholar]
  17. Shimada T, Katayama Y, Kawakita S, Ogasawara H, Nakano M et al. A novel regulator RcdA of the csgD gene encoding the master regulator of biofilm formation in Escherichia coli . Microbiologyopen 2012; 1:381–394 [View Article]
    [Google Scholar]
  18. Aquino P, Honda B, Jaini S, Lyubetskaya A, Hosur K et al. Coordinated regulation of acid resistance in Escherichia coli . BMC Syst Biol 2017; 11:1 [View Article]
    [Google Scholar]
  19. Fitzgerald DM, Bonocora RP, Wade JT, Søgaard-Andersen L. Comprehensive mapping of the Escherichia coli flagellar regulatory network. PLoS Genet 2014; 10:e1004649 [View Article] [PubMed]
    [Google Scholar]
  20. Gao Y, Lim HG, Verkler H, Szubin R, Quach D et al. Unraveling the functions of uncharacterized transcription factors in Escherichia coli using ChIP-exo. Nucleic Acids Res 2021; 49:9696–9710 [View Article] [PubMed]
    [Google Scholar]
  21. Seo SW, Kim D, O’Brien EJ, Szubin R, Palsson BO. Decoding genome-wide GadEWX-transcriptional regulatory networks reveals multifaceted cellular responses to acid stress in Escherichia coli . Nat Commun 2015; 6:7970 [View Article] [PubMed]
    [Google Scholar]
  22. Mendoza-Vargas A, Olvera L, Olvera M, Grande R, Vega-Alvarado L et al. Genome-wide identification of transcription start sites, promoters and transcription factor binding sites in E. coli . PLoS One 2009; 4:10 [View Article] [PubMed]
    [Google Scholar]
  23. Gama-Castro S, Salgado H, Santos-Zavaleta A, Ledezma-Tejeida D, Muñiz-Rascado L et al. RegulonDB version 9.0: high-level integration of gene regulation, coexpression, motif clustering and beyond. Nucleic Acids Res 2016; 44:D133–43 [View Article] [PubMed]
    [Google Scholar]
  24. Santos-Zavaleta A, Sánchez-Pérez M, Salgado H, Velázquez-Ramírez DA, Gama-Castro S et al. A unified resource for transcriptional regulation in Escherichia coli K-12 incorporating high-throughput-generated binding data into RegulonDB version 10.0. BMC Biol 2018; 16:91 [View Article] [PubMed]
    [Google Scholar]
  25. Moretto M, Sonego P, Dierckxsens N, Brilli M, Bianco L et al. COLOMBOS v3.0: leveraging gene expression compendia for cross-species analyses. Nucleic Acids Res 2016; 44:D620–3 [View Article] [PubMed]
    [Google Scholar]
  26. Ishihama A, Shimada T, Yamazaki Y. Transcription profile of Escherichia coli: genomic SELEX search for regulatory targets of transcription factors. Nucleic Acids Res 2016; 44:2058–2074 [View Article] [PubMed]
    [Google Scholar]
  27. Decker KT, Gao Y, Rychel K, Al Bulushi T, Chauhan SM et al. proChIPdb: a chromatin immunoprecipitation database for prokaryotic organisms. Nucleic Acids Res 2022; 50:D1077–D1084 [View Article] [PubMed]
    [Google Scholar]
  28. Tierrafría VH, Mejía-Almonte C, Camacho-Zaragoza JM, Salgado H, Alquicira K et al. MCO: towards an ontology and unified vocabulary for a framework-based annotation of microbial growth conditions. Bioinformatics 2019; 35:856–864 [View Article] [PubMed]
    [Google Scholar]
  29. Mölder F, Jablonski KP, Letcher B, Hall MB, Tomkins-Tinch CH et al. Sustainable data analysis with Snakemake. F1000Res 2021; 10:33 [View Article] [PubMed]
    [Google Scholar]
  30. Robinson JT, Thorvaldsdóttir H, Turner D, Mesirov JP. IGV.js: an embeddable JavaScript implementation of the Integrative Genomics Viewer (IGV). Bioinformatics 2020 [View Article]
    [Google Scholar]
  31. Seo SW, Kim D, Szubin R, Palsson BO. Genome-wide reconstruction of OxyR and SoxRS transcriptional regulatory networks under oxidative stress in Escherichia coli K-12 MG1655. Cell Rep 2015; 12:1289–1299 [View Article] [PubMed]
    [Google Scholar]
  32. Zere TR, Vakulskas CA, Leng Y, Pannuri A, Potts AH et al. Genomic targets and features of Bara-uvry (-sira). Signal Transduction Systems PLoS One 2015; 10:12 [View Article]
    [Google Scholar]
  33. Ueguchi C, Mizuno T. The Escherichia coli nucleoid protein H-NS functions directly as a transcriptional repressor. EMBO J 1993; 12:1039–1046 [View Article] [PubMed]
    [Google Scholar]
  34. Antipov SS, Tutukina MN, Preobrazhenskaya EV, Kondrashov FA, Patrushev MV et al. The nucleoid protein Dps binds genomic DNA of Escherichia coli in a non-random manner. PLoS One 2017; 12:e0182800 [View Article] [PubMed]
    [Google Scholar]
  35. Prieto AI, Kahramanoglou C, Ali RM, Fraser GM, Seshasayee ASN et al. Genomic analysis of DNA binding and gene regulation by homologous nucleoid-associated proteins IHF and HU in Escherichia coli K12. Nucleic Acids Res 2012; 40:3524–3537 [View Article] [PubMed]
    [Google Scholar]
  36. Lim CJ, Lee SY, Teramoto J, Ishihama A, Yan J. The nucleoid-associated protein Dan organizes chromosomal DNA through rigid nucleoprotein filament formation in E. coli during anoxia. Nucleic Acids Res 2013; 41:746–753 [View Article] [PubMed]
    [Google Scholar]
  37. Baumgart LA, Lee JE, Salamov A, Dilworth DJ, Na H et al. Persistence and plasticity in bacterial gene regulation. Nat Methods 2021; 18:1499–1505 [View Article] [PubMed]
    [Google Scholar]
  38. Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet j 2011; 17:10 [View Article]
    [Google Scholar]
  39. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods 2012; 9:357–359 [View Article] [PubMed]
    [Google Scholar]
  40. Wingett SW, Andrews S. FastQ Screen: A tool for multi-genome mapping and quality control. F1000Res 2018; 7:1338 [View Article] [PubMed]
    [Google Scholar]
  41. Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 2016; 32:3047–3048 [View Article] [PubMed]
    [Google Scholar]
  42. Feng J, Liu T, Zhang Y. Using MACS to identify peaks from ChIP-Seq data. Curr Protoc Bioinformatics 2011; Chapter 2:Unit [View Article] [PubMed]
    [Google Scholar]
  43. Turatsinze J-V, Thomas-Chollier M, Defrance M, van Helden J. Using RSAT to scan genome sequences for transcription factor binding sites and cis-regulatory modules. Nat Protoc 2008; 3:1578–1588 [View Article] [PubMed]
    [Google Scholar]
  44. Medina-Rivera A, Abreu-Goodger C, Thomas-Chollier M, Salgado H, Collado-Vides J et al. Theoretical and empirical quality assessment of transcription factor-binding motifs. Nucleic Acids Res 2011; 39:808–824 [View Article] [PubMed]
    [Google Scholar]
  45. Thomas-Chollier M, Darbo E, Herrmann C, Defrance M, Thieffry D et al. A complete workflow for the analysis of full-size ChIP-seq (and similar) data sets using peak-motifs. Nat Protoc 2012; 7:1551–1568 [View Article] [PubMed]
    [Google Scholar]
  46. Nguyen NTT, Contreras-Moreira B, Castro-Mondragon JA, Santana-Garcia W, Ossio R et al. RSAT 2018: regulatory sequence analysis tools 20th anniversary. Nucleic Acids Res 2018; 46:W209–W214 [View Article] [PubMed]
    [Google Scholar]
  47. Thomason MK, Bischler T, Eisenbart SK, Förstner KU, Zhang A et al. Global transcriptional start site mapping using differential RNA sequencing reveals novel antisense RNAs in Escherichia coli . J Bacteriol 2015; 197:18–28 [View Article]
    [Google Scholar]
  48. Cho BK, Kim D, Knight EM, Zengler K, Palsson BO. Genome-scale reconstruction of the sigma factor network in Escherichia coli: topology and functional states. BMC Biol 2014; 12:4 [View Article] [PubMed]
    [Google Scholar]
  49. Lafferty JD, McCallum A, Pereira FCN. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. Proceedings of the Eighteenth International Conference on Machine Learning. 2001; Morgan Kaufmann Publishers Inc.: 282–9.
    [Google Scholar]
  50. Peng F, McCallum A. Information extraction from research papers using conditional random fields. Information Processing & Management 2006; 42:963–979 [View Article]
    [Google Scholar]
  51. Bernstein MN, Doan A, Dewey CN. MetaSRA: normalized human sample-specific metadata for the Sequence Read Archive. Bioinformatics 2017; 33:2914–2923 [View Article] [PubMed]
    [Google Scholar]
  52. Huerta AM, Salgado H, Thieffry D, Collado-Vides J. RegulonDB: a database on transcriptional regulation in Escherichia coli . Nucleic Acids Res 1998; 26:55–59 [View Article] [PubMed]
    [Google Scholar]
/content/journal/mgen/10.1099/mgen.0.000833
Loading
/content/journal/mgen/10.1099/mgen.0.000833
Loading

Data & Media loading...

Supplements

Supplementary material 1

EXCEL

Supplementary material 2

EXCEL

Supplementary material 3

EXCEL

Supplementary material 4

EXCEL

Supplementary material 5

EXCEL

Supplementary material 6

EXCEL

Supplementary material 7

EXCEL

Supplementary material 8

EXCEL

Supplementary material 9

EXCEL
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error