- Volume 8, Issue 5, 2022
Volume 8, Issue 5, 2022
- Research Articles
-
- Genomic Methodologies
-
-
RegulonDB 11.0: Comprehensive high-throughput datasets on transcriptional regulation in Escherichia coli K-12
Víctor H. Tierrafría, Claire Rioualen, Heladia Salgado, Paloma Lara, Socorro Gama-Castro, Patrick Lally, Laura Gómez-Romero, Pablo Peña-Loredo, Andrés G. López-Almazo, Gabriel Alarcón-Carranza, Felipe Betancourt-Figueroa, Shirley Alquicira-Hernández, J. Enrique Polanco-Morelos, Jair García-Sotelo, Estefani Gaytan-Nuñez, Carlos-Francisco Méndez-Cruz, Luis J. Muñiz, César Bonavides-Martínez, Gabriel Moreno-Hagelsieb, James E. Galagan, Joseph T. Wade and Julio Collado-VidesGenomics has set the basis for a variety of methodologies that produce high-throughput datasets identifying the different players that define gene regulation, particularly regulation of transcription initiation and operon organization. These datasets are available in public repositories, such as the Gene Expression Omnibus, or ArrayExpress. However, accessing and navigating such a wealth of data is not straightforward. No resource currently exists that offers all available high and low-throughput data on transcriptional regulation in Escherichia coli K-12 to easily use both as whole datasets, or as individual interactions and regulatory elements. RegulonDB (https://regulondb.ccg.unam.mx) began gathering high-throughput dataset collections in 2009, starting with transcription start sites, then adding ChIP-seq and gSELEX in 2012, with up to 99 different experimental high-throughput datasets available in 2019. In this paper we present a radical upgrade to more than 2000 high-throughput datasets, processed to facilitate their comparison, introducing up-to-date collections of transcription termination sites, transcription units, as well as transcription factor binding interactions derived from ChIP-seq, ChIP-exo, gSELEX and DAP-seq experiments, besides expression profiles derived from RNA-seq experiments. For ChIP-seq experiments we offer both the data as presented by the authors, as well as data uniformly processed in-house, enhancing their comparability, as well as the traceability of the methods and reproducibility of the results. Furthermore, we have expanded the tools available for browsing and visualization across and within datasets. We include comparisons against previously existing knowledge in RegulonDB from classic experiments, a nucleotide-resolution genome viewer, and an interface that enables users to browse datasets by querying their metadata. A particular effort was made to automatically extract detailed experimental growth conditions by implementing an assisted curation strategy applying Natural language processing and machine learning. We provide summaries with the total number of interactions found in each experiment, as well as tools to identify common results among different experiments. This is a long-awaited resource to make use of such wealth of knowledge and advance our understanding of the biology of the model bacterium E. coli K-12.
-
- Functional Genomics and Microbe–Niche Interactions
-
-
Small and intermediate size structural RNAs in the unicellular parasite Cryptosporidium parvum as revealed by sRNA-seq and comparative genomics
More LessSmall and intermediate-size noncoding RNAs (sRNAs and is-ncRNAs) have been shown to play important regulatory roles in the development of several eukaryotic organisms. However, they have not been thoroughly explored in Cryptosporidium parvum, an obligate zoonotic protist parasite responsible for the diarrhoeal disease cryptosporidiosis. Using Illumina sequencing of a small RNA library, a systematic identification of novel small and is-ncRNAs was performed in C. parvum excysted sporozoites. A total of 79 novel is-ncRNA candidates, including antisense, intergenic and intronic is-ncRNAs, were identified, including 7 new small nucleolar RNAs (snoRNAs). Expression of select novel is-ncRNAs was confirmed by RT-PCR. Phylogenetic conservation was analysed using covariance models (CMs) in related Cryptosporidium and apicomplexan parasite genome sequences. A potential new type of small ncRNA derived from tRNA fragments was observed. Overall, a deep profiling analysis of novel is-ncRNAs in C. parvum and related species revealed structural features and conservation of these novel is-ncRNAs. Covariance models can be used to detect is-ncRNA genes in other closely related parasites. These findings provide important new sequences for additional functional characterization of novel is-ncRNAs in the protist pathogen C. parvum.
-
- Pathogens and Epidemiology
-
-
Mobility of antimicrobial resistance across serovars and disease presentations in non-typhoidal Salmonella from animals and humans in Vietnam
Non-typhoidal Salmonella (NTS) is a major cause of bacterial enterocolitis globally but also causes invasive bloodstream infections. Antimicrobial resistance (AMR) hampers the treatment of these infections and understanding how AMR spreads between NTS may help in developing effective strategies. We investigated NTS isolates associated with invasive disease, diarrhoeal disease and asymptomatic carriage in animals and humans from Vietnam. Isolates included multiple serovars and both common and rare phenotypic AMR profiles; long- and short-read sequencing was used to investigate the genetic mechanisms and genomic backgrounds associated with phenotypic AMR profiles. We demonstrate concordance between most AMR genotypes and phenotypes but identified large genotypic diversity in clinically relevant phenotypes and the high mobility potential of AMR genes (ARGs) in this setting. We found that 84 % of ARGs identified were located on plasmids, most commonly those containing IncHI1A_1 and IncHI1B(R27)_1_R27 replicons (33%), and those containing IncHI2_1 and IncHI2A_1 replicons (31%). The vast majority (95%) of ARGS were found within 10 kbp of IS6/IS26 elements, which provide plasmids with a mechanism to exchange ARGs between plasmids and other parts of the genome. Whole genome sequencing with targeted long-read sequencing applied in a One Health context identified a comparatively limited number of insertion sequences and plasmid replicons associated with AMR. Therefore, in the context of NTS from Vietnam and likely for other settings as well, the mechanisms by which ARGs move contribute to a more successful AMR profile than the specific ARGs, facilitating the adaptation of bacteria to different environments or selection pressures.
-
-
-
Carriage prevalence and genomic epidemiology of Staphylococcus aureus among Native American children and adults in the Southwestern USA
Native American individuals in the Southwestern USA experience a higher burden of invasive Staphylococcus aureus disease than the general population. However, little is known about S. aureus carriage in these communities. A cross-sectional study was conducted to determine the carriage prevalence, risk factors and genomic epidemiology of S. aureus among Native American children (<5 years, n=121) and adults (≥18 years, n=167) in the Southwestern USA. Short- and long-read sequencing data were generated using Illumina and Oxford Nanopore Technology platforms to produce high-quality hybrid assemblies, and antibiotic-resistance, virulence and pangenome analyses were performed. S. aureus carriage prevalence was 20.7 % among children, 30.2 % among adults 18–64 years and 16.7 % among adults ≥65 years. Risk factors among adults included recent surgery, prior S. aureus infection among household members, and recent use of gyms or locker rooms by household members. No risk factors were identified among children. The bacterial population structure was dominated by clonal complex 1 (CC1) (21.1 %), CC5 (22.2 %) and CC8 (22.2 %). Isolates from children and adults were intermixed throughout the phylogeny. While the S. aureus population was diverse, the carriage prevalence was comparable to that in the general USA population. Genomic and risk-factor data suggest household, community and healthcare transmission are important components of the local epidemiology.
-
-
-
Core-, pan- and accessory genome analyses of Clostridium neonatale: insights into genetic diversity
Clostridium neonatale is a potential opportunistic pathogen recovered from faecal samples in cases of necrotizing enterocolitis (NEC), a gastrointestinal disease affecting preterm neonates. Although the C. neonatale species description and name validation were published in 2018, comparative genomics are lacking. In the present study, we provide the closed genome assembly of the C. neonatale ATCC BAA-265T (=250.09) reference strain with a manually curated functional annotation of the coding sequences. Pan-, core- and accessory genome analyses were performed using the complete 250.09 genome (4.7 Mb), three new assemblies (4.6–5.6 Mb), and five publicly available draft genome assemblies (4.6–4.7 Mb). The C. neonatale pan-genome contains 6840 genes, while the core-genome has 3387 genes. Pan-genome analysis revealed an ‘open’ state and genomic diversity. The strain-specific gene families ranged from five to 742 genes. Multiple mobile genetic elements were predicted, including a total of 201 genomic islands, 13 insertion sequence families, one CRISPR-Cas type I-B system and 15 predicted intact prophage signatures. Primary virulence classes including offensive, defensive, regulation of virulence-associated genes and non-specific virulence factors were identified. The presence of a tet(W/N/W) gene encoding a tetracycline resistance ribosomal protection protein and a 23S rRNA methyltransferase ermQ gene were identified in two different strains. Together, our results revealed a genetic diversity and plasticity of C. neonatale genomes and provide a comprehensive view of this species genomic features, paving the way for the characterization of its biological capabilities.
-
-
-
The characterization of Moraxella catarrhalis carried in the general population
More LessMoraxella catarrhalis is a common cause of respiratory tract infection, particularly otitis media in children, whilst it is also associated with the onset of exacerbation in chronic obstructive pulmonary disease in adults. Despite the need for an efficacious vaccine against M. catarrhalis , no candidates have progressed to clinical trial. This study, therefore, aimed to characterize the diversity of M. catarrhalis isolated from the upper respiratory tract of healthy children and adults, to gain a better understanding of the epidemiology of M. catarrhalis and the distribution of genes associated with virulence factors, to aid vaccine efforts. Isolates were sequenced and the presence of target genes reported. Contrary to prevailing data, this study found that lipooligosaccharide (LOS) B serotypes are not exclusively associated with 16S type 1. In addition, a particularly low prevalence of LOS B and high prevalence of LOS C serotypes was observed. M. catarrhalis isolates showed low prevalence of antimicrobial resistance and a high gene prevalence for a number of the target genes investigated: ompB2 (also known as copB), ompCD, ompE, ompG1a, ompG1b, mid (also known as hag), mcaP, m35, tbpA, lbpA, tbpB, lbpB, msp22, msp75 and msp78, afeA, pilA, pilQ, pilT, mod, oppA, sbp2, mcmA and mclS.
-
-
-
Investigating plant disease outbreaks with long-read metagenomics: sensitive detection and highly resolved phylogenetic reconstruction applied to Xylella fastidiosa
Early disease detection is a prerequisite for enacting effective interventions for disease control. Strains of the bacterial plant pathogen Xylella fastidiosa have recurrently spread to new crops in new countries causing devastating outbreaks. So far, investigation of outbreak strains and highly resolved phylogenetic reconstruction have required whole-genome sequencing of pure bacterial cultures, which are challenging to obtain due to the fastidious nature of X. fastidiosa . Here, we show that culture-independent metagenomic sequencing, using the Oxford Nanopore Technologies MinION long-read sequencer, can sensitively and specifically detect the causative agent of Pierce’s disease of grapevine, X. fastidiosa subspecies fastidiosa . Using a DNA sample from a grapevine in Virginia, USA, it was possible to obtain a metagenome-assembled genome (MAG) of sufficient quality for phylogenetic reconstruction with SNP resolution. The analysis placed the MAG in a clade with isolates from Georgia, USA, suggesting introduction of X. fastidiosa subspecies fastidiosa to Virginia from the south-eastern USA. This proof of concept study, thus, revealed that metagenomic sequencing can replace culture-dependent genome sequencing for reconstructing transmission routes of bacterial plant pathogens.
-
-
-
Enterovirus D68 epidemic, UK, 2018, was caused by subclades B3 and D1, predominantly in children and adults, respectively, with both subclades exhibiting extensive genetic diversity
Enterovirus D68 (EV-D68) has recently been identified in biennial epidemics coinciding with diagnoses of non-polio acute flaccid paralysis/myelitis (AFP/AFM). We investigated the prevalence, genetic relatedness and associated clinical features of EV-D68 in 193 EV-positive samples from 193 patients in late 2018, UK. EV-D68 was detected in 83 (58 %) of 143 confirmed EV-positive samples. Sequencing and phylogenetic analysis revealed extensive genetic diversity, split between subclades B3 (n=50) and D1 (n=33), suggesting epidemiologically unrelated infections. B3 predominated in children and younger adults, and D1 in older adults and the elderly (P=0.0009). Clinical presentation indicated causation or exacerbation of respiratory distress in 91.4 % of EV-D68-positive individuals, principally cough (75.3 %), shortness of breath (56.8 %), coryza (48.1 %), wheeze (46.9 %), supplemental oxygen required (46.9 %) and fever (38.9 %). Two cases of AFM were observed, one with EV-D68 detectable in the cerebrospinal fluid, but otherwise neurological symptoms were rarely reported (n=4). Both AFM cases and all additional instances of intensive care unit (ICU) admission (n=5) were seen in patients infected with EV-D68 subclade B3. However, due to the infrequency of severe infection in our cohort, statistical significance could not be assessed.
-
-
-
Genomic epidemiology and temperature dependency of hypermucoviscous Klebsiella pneumoniae in Japan
Klebsiella pneumoniae (Kp) has emerged as a global life-threatening pathogen owing to its multidrug resistance and hypervirulence phenotype. Several fatal outbreaks of carbapenem-resistant hypervirulent Kp have been reported recently. Hypermucoviscosity (HMV) is a phenotype commonly associated with hypervirulence of Kp, which is usually regulated by rmpA or rmpA2 (regulators of the mucoid phenotype). Here, we found that temperature was important in the HMV phenotype of Kp, and the impact of temperature on HMV was not uniform among strains. We investigated the HMV phenotype at 37 °C and room temperature (20–25 °C) in 170 clinically isolated hypermucoviscous Kp strains in Japan and analysed the association between the HMV phenotype, virulence genes and antimicrobial resistance (AMR) genes. String length distribution at different temperatures was correlated with the genomic population of Kp. The strains carrying rmpA/rmpA2 frequently showed the HMV phenotype at 37 °C, while the strains negative for these genes tended to show the HMV phenotype at room temperature. Hypervirulent Kp clusters carrying rmpA/rmpA2 without extended-spectrum beta-lactamases (ESBL)/carbapenemases produced higher string lengths at 37 °C than at room temperature, and were mostly isolated from the respiratory tract. Other HMV strains showed distinct characteristics of not carrying rmpA/rmpA2 but were positive for ESBL/carbapenemases, with a higher string length at room temperature than at 37 °C, and were frequently isolated from bloodstream infections. In total, 21 (13.5 %) HMV isolates carried ESBL and carbapenemases, among which five isolates were carbapenem-resistant hypervirulent Kp with a pLVPK-like plasmid (an epidemic virulence plasmid) and a pKPI-6-like plasmid (an epidemic bla IMP-6-bearing plasmid in Japan), suggesting the convergence of worldwide hypervirulence and epidemic AMR in Japan.
-
-
-
Helicobacter cinaedi is a human-adapted lineage in the Helicobacter cinaedi/canicola/‘magdeburgensis’ complex
Helicobacter cinaedi is an enterohepatic Helicobacter that causes bacteremia and other diseases in humans. While H. cinaedi -like strains are isolated from animals, including dog isolates belonging to a recently proposed H. canicola , little is known about the genetic differences between H. cinaedi and these animal isolates. Here, we sequenced 43 H. cinaedi- or H. canicola -like strains isolated from humans, hamsters, rats and dogs and collected 81 genome sequences of H. cinaedi , H. canicola and other enterohepatic Helicobacter strains from public databases. Genomic comparison of these strains identified four distinct clades (clades I–IV) in H. cinaedi/canicola/‘magderbugensis’ (HCCM) complex. Among these, clade I corresponds to H. cinaedi sensu stricto and represents a human-adapted lineage in the complex. We identified several genomic features unique to clade I. They include the accumulation of antimicrobial resistance-related mutations that reflects the human association of clade I and the larger genome size and the presence of a CRISPR-Cas system and multiple toxin-antitoxin and restriction-modification systems, both of which indicate the contribution of horizontal gene transfer to the evolution of clade I. In addition, nearly all clade I strains but only a few strains belonging to one minor clade contained a highly variable genomic region encoding a type VI secretion system (T6SS), which could play important roles in gut colonization by killing competitors or inhibiting their growth. We also developed a method to systematically search for H. cinaedi sequences in large metagenome data sets based on the results of genome comparison. Using this method, we successfully identified multiple HCCM complex-containing human faecal metagenome samples and obtained the sequence information covering almost the entire genome of each strain. Importantly, all were clade I strains, supporting our conclusion that H. cinaedi sensu stricto is a human-adapted lineage in the HCCM complex.
-
- Evolution and Responses to Interventions
-
-
Impacts of Mycoplasma agalactiae restriction-modification systems on pan-epigenome dynamics and genome plasticity
DNA methylations play an important role in the biology of bacteria. Often associated with restriction modification (RM) systems, they are important drivers of bacterial evolution interfering in horizontal gene transfer events by providing a defence against foreign DNA invasion or by favouring genetic transfer through production of recombinogenic DNA ends. Little is known regarding the methylome of the Mycoplasma genus, which encompasses several pathogenic species with small genomes. Here, genome-wide detection of DNA methylations was conducted using single molecule real-time (SMRT) and bisulphite sequencing in several strains of Mycoplasma agalactiae , an important ruminant pathogen and a model organism. Combined with whole-genome analysis, this allowed the identification of 19 methylated motifs associated with three orphan methyltransferases (MTases) and eight RM systems. All systems had a homolog in at least one phylogenetically distinct Mycoplasma spp. Our study also revealed that several superimposed genetic events may participate in the M. agalactiae dynamic epigenomic landscape. These included (i) DNA shuffling and frameshift mutations that affect the MTase and restriction endonuclease content of a clonal population and (ii) gene duplication, erosion, and horizontal transfer that modulate MTase and RM repertoires of the species. Some of these systems were experimentally shown to play a major role in mycoplasma conjugative, horizontal DNA transfer. While the versatility of DNA methylation may contribute to regulating essential biological functions at cell and population levels, RM systems may be key in mycoplasma genome evolution and adaptation by controlling horizontal gene transfers.
-
- Short Communications
-
- Genomic Methodologies
-
-
Target-enrichment sequencing yields valuable genomic data for challenging-to-culture bacteria of public health importance
Genomic data contribute invaluable information to the epidemiological investigation of pathogens of public health importance. However, whole-genome sequencing (WGS) of bacteria typically relies on culture, which represents a major hurdle for generating such data for a wide range of species for which culture is challenging. In this study, we assessed the use of culture-free target-enrichment sequencing as a method for generating genomic data for two bacterial species: (1) Bacillus anthracis, which causes anthrax in both people and animals and whose culture requires high-level containment facilities; and (2) Mycoplasma amphoriforme , a fastidious emerging human respiratory pathogen. We obtained high-quality genomic data for both species directly from clinical samples, with sufficient coverage (>15×) for confident variant calling over at least 80% of the baited genomes for over two thirds of the samples tested. Higher qPCR cycle threshold (Ct) values (indicative of lower pathogen concentrations in the samples), pooling libraries prior to capture, and lower captured library concentration were all statistically associated with lower capture efficiency. The Ct value had the highest predictive value, explaining 52 % of the variation in capture efficiency. Samples with Ct values ≤30 were over six times more likely to achieve the threshold coverage than those with a Ct > 30. We conclude that target-enrichment sequencing provides a valuable alternative to standard WGS following bacterial culture and creates opportunities for an improved understanding of the epidemiology and evolution of many clinically important pathogens for which culture is challenging.
-
- Functional Genomics and Microbe–Niche Interactions
-
-
Profiling the plasmid conjugation potential of urinary Escherichia coli
More LessEscherichia coli is often associated with urinary tract infection (UTI). Antibiotic resistance in E. coli is an ongoing challenge in managing UTI. Extrachromosomal elements – plasmids – are vectors for clinically relevant traits, such as antibiotic resistance, with conjugation being one of the main methods for horizontal propagation of plasmids in bacterial populations. Targeting of conjugation components has been proposed as a strategy to curb the spread of plasmid-borne antibiotic resistance. Understanding the types of conjugative systems present in urinary E. coli isolates is fundamental to assessing the viability of this strategy. In this study, we profile two well-studied conjugation systems (F-type and P-type) in the draft genomes of 65 urinary isolates of E. coli obtained from the bladder urine of adult women with and without UTI-like symptoms. Most of these isolates contained plasmids and we found that conjugation genes were abundant/ubiquitous, diverse and often associated with IncF plasmids. To validate conjugation of these urinary plasmids, the plasmids from two urinary isolates, UMB1223 (predicted to have F-type genes) and UMB1284 (predicted to have P-type genes), were transferred by conjugation into the K-12 E. coli strain MG1655. Overall, the findings of this study support the notion that care should be taken in targeting any individual component of a urinary E. coli isolate’s conjugation system, given the inherent mechanistic redundancy, gene diversity and different types of conjugation systems in this population.
-
- Methods
-
- Genomic Methodologies
-
-
Enabling genomic island prediction and comparison in multiple genomes to investigate bacterial evolution and outbreaks
Outbreaks of virulent and/or drug-resistant bacteria have a significant impact on human health and major economic consequences. Genomic islands (GIs; defined as clusters of genes of probable horizontal origin) are of high interest because they disproportionately encode virulence factors, some antimicrobial-resistance (AMR) genes, and other adaptations of medical or environmental interest. While microbial genome sequencing has become rapid and inexpensive, current computational methods for GI analysis are not amenable for rapid, accurate, user-friendly and scalable comparative analysis of sets of related genomes. To help fill this gap, we have developed IslandCompare, an open-source computational pipeline for GI prediction and comparison across several to hundreds of bacterial genomes. A dynamic and interactive visualization strategy displays a bacterial core-genome phylogeny, with bacterial genomes linearly displayed at the phylogenetic tree leaves. Genomes are overlaid with GI predictions and AMR determinants from the Comprehensive Antibiotic Resistance Database (CARD), and regions of similarity between the genomes are also displayed. GI predictions are performed using Sigi-HMM and IslandPath-DIMOB, the two most precise GI prediction tools based on nucleotide composition biases, as well as a novel blast-based consistency step to improve cross-genome prediction consistency. GIs across genomes sharing sequence similarity are grouped into clusters, further aiding comparative analysis and visualization of acquisition and loss of mobile GIs in specific sub-clades. IslandCompare is an open-source software that is containerized for local use, plus available via a user-friendly, web-based interface to allow direct use by bioinformaticians, biologists and clinicians (at https://islandcompare.ca).
-
-
-
Whokaryote: distinguishing eukaryotic and prokaryotic contigs in metagenomes based on gene structure
More LessMetagenomics has become a prominent technology to study the functional potential of all organisms in a microbial community. Most studies focus on the bacterial content of these communities, while ignoring eukaryotic microbes. Indeed, many metagenomics analysis pipelines silently assume that all contigs in a metagenome are prokaryotic, likely resulting in less accurate annotation of eukaryotes in metagenomes. Early detection of eukaryotic contigs allows for eukaryote-specific gene prediction and functional annotation. Here, we developed a classifier that distinguishes eukaryotic from prokaryotic contigs based on foundational differences between these taxa in terms of gene structure. We first developed Whokaryote, a random forest classifier that uses intergenic distance, gene density and gene length as the most important features. We show that, with an estimated recall, precision and accuracy of 94, 96 and 95 %, respectively, this classifier with features grounded in biology can perform almost as well as the classifiers EukRep and Tiara, which use k-mer frequencies as features. By retraining our classifier with Tiara predictions as an additional feature, the weaknesses of both types of classifiers are compensated; the result is Whokaryote+Tiara, an enhanced classifier that outperforms all individual classifiers, with an F1 score of 0.99 for both eukaryotes and prokaryotes, while still being fast. In a reanalysis of metagenome data from a disease-suppressive plant endospheric microbial community, we show how using Whokaryote+Tiara to select contigs for eukaryotic gene prediction facilitates the discovery of several biosynthetic gene clusters that were missed in the original study. Whokaryote (+Tiara) is wrapped in an easily installable package and is freely available from https://github.com/LottePronk/whokaryote.
-