Full text loading...
Abstract
Genomics has set the basis for a variety of methodologies that produce high-throughput datasets identifying the different players that define gene regulation, particularly regulation of transcription initiation and operon organization. These datasets are available in public repositories, such as the Gene Expression Omnibus, or ArrayExpress. However, accessing and navigating such a wealth of data is not straightforward. No resource currently exists that offers all available high and low-throughput data on transcriptional regulation in Escherichia coli K-12 to easily use both as whole datasets, or as individual interactions and regulatory elements. RegulonDB (https://regulondb.ccg.unam.mx) began gathering high-throughput dataset collections in 2009, starting with transcription start sites, then adding ChIP-seq and gSELEX in 2012, with up to 99 different experimental high-throughput datasets available in 2019. In this paper we present a radical upgrade to more than 2000 high-throughput datasets, processed to facilitate their comparison, introducing up-to-date collections of transcription termination sites, transcription units, as well as transcription factor binding interactions derived from ChIP-seq, ChIP-exo, gSELEX and DAP-seq experiments, besides expression profiles derived from RNA-seq experiments. For ChIP-seq experiments we offer both the data as presented by the authors, as well as data uniformly processed in-house, enhancing their comparability, as well as the traceability of the methods and reproducibility of the results. Furthermore, we have expanded the tools available for browsing and visualization across and within datasets. We include comparisons against previously existing knowledge in RegulonDB from classic experiments, a nucleotide-resolution genome viewer, and an interface that enables users to browse datasets by querying their metadata. A particular effort was made to automatically extract detailed experimental growth conditions by implementing an assisted curation strategy applying Natural language processing and machine learning. We provide summaries with the total number of interactions found in each experiment, as well as tools to identify common results among different experiments. This is a long-awaited resource to make use of such wealth of knowledge and advance our understanding of the biology of the model bacterium E. coli K-12.
- Received:
- Accepted:
- Published Online:
Funding
-
DGAPA-UNAM
(Award 369220)
- Principle Award Recipient: EstefaniGaytan-Nuñez
-
DGAPA-UNAM
(Award 18182)
- Principle Award Recipient: EstefaniGaytan-Nuñez
-
CONACyT
(Award 929687)
- Principle Award Recipient: ClaireRioualen
-
Canadian Network for Research and Innovation in Machining Technology, Natural Sciences and Engineering Research Council of Canada
- Principle Award Recipient: GabrielMoreno-Hagelsieb
-
National Institute of General Medical Sciences
(Award 5RO1GM131643)
- Principle Award Recipient: JulioCollado-Vides
-
UNAM-PAPIIT
(Award IA203420)
- Principle Award Recipient: Carlos-FranciscoMéndez-Cruz
-
DGAPA-UNAM
(Award Postdoctoral Fellowship)
- Principle Award Recipient: LaraPaloma