- Volume 6, Issue 11, 2020
Volume 6, Issue 11, 2020
- Research Article
-
- Microbial Evolution and Epidemiology
- Population Genomics
-
-
Evolutionary history and current distribution of the West Mediterranean lineage of Brucella melitensis in Italy
Ovine and caprine brucellosis, caused by Brucella melitensis , is one of the world’s most widespread zoonoses and is a major cause of economic losses in domestic ruminant production. In Italy, the disease remains endemic in several southern provinces, despite an ongoing brucellosis eradication programme. In this study, we used whole-genome sequencing to detail the genetic diversity of circulating strains, and to examine the origins of the predominant sub-lineages of B. melitensis in Italy. We reconstructed a global phylogeny of B. melitensis , strengthened by 339 new whole-genome sequences, from Italian isolates collected from 2011 to 2018 as part of a national livestock surveillance programme. All Italian strains belonged to the West Mediterranean lineage, which further divided into two major clades that diverged roughly between the 5th and 7th centuries. We observed that Sicily serves as a brucellosis burden hotspot, giving rise to several distinct sub-lineages. More than 20 putative outbreak clusters of ovine and caprine brucellosis were identified, several of which persisted over the 8 year survey period despite an aggressive brucellosis eradication campaign. While the outbreaks in Central and Northern Italy were generally associated with introductions of single clones of B. melitensis and their subsequent dissemination within neighbouring territories, we observed weak geographical segregation of genotypes in the southern regions. Biovar determination, recommended in routine analysis of all Brucella strains by the World Organisation for Animal Health (OIE), could not discriminate among the four main global clades. This demonstrates a need for updating the guidelines used for monitoring B. melitensis transmission and spread, both at the national and international level, and to include whole-genome-based typing as the principal method for identification and tracing of brucellosis outbreaks.
-
-
-
Whole-genome epidemiology links phage-mediated acquisition of a virulence gene to the clonal expansion of a pandemic Salmonella enterica serovar Typhimurium clone
Epidemic and pandemic clones of bacterial pathogens with distinct characteristics continually emerge, replacing those previously dominant through mechanisms that remain poorly characterized. Here, whole-genome-sequencing-powered epidemiology linked horizontal transfer of a virulence gene, sopE, to the emergence and clonal expansion of a new epidemic Salmonella enterica serovar Typhimurium (S. Typhimurium) clone. The sopE gene is sporadically distributed within the genus Salmonella and rare in S . enterica Typhimurium lineages, but was acquired multiple times during clonal expansion of the currently dominant pandemic monophasic S. Typhimurium sequence type (ST) 34 clone. Ancestral state reconstruction and time-scaled phylogenetic analysis indicated that sopE was not present in the common ancestor of the epidemic clade, but later acquisition resulted in increased clonal expansion of sopE-containing clones that was temporally associated with emergence of the epidemic, consistent with increased fitness. The sopE gene was mainly associated with a temperate bacteriophage mTmV, but recombination with other bacteriophage and apparent horizontal gene transfer of the sopE gene cassette resulted in distribution among at least four mobile genetic elements within the monophasic S . enterica Typhimurium ST34 epidemic clade. The mTmV prophage lysogenic transfer to other S. enterica serovars in vitro was limited, but included the common pig-associated S . enterica Derby (S. Derby). This may explain mTmV in S. Derby co-circulating on farms with monophasic S. Typhimurium ST34, highlighting the potential for further transfer of the sopE virulence gene in nature. We conclude that whole-genome epidemiology pinpoints potential drivers of evolutionary and epidemiological dynamics during pathogen emergence, and identifies targets for subsequent research in epidemiology and bacterial pathogenesis.
-
- Communicable Disease Genomics
-
-
Transmission analysis of a large tuberculosis outbreak in London: a mathematical modelling study using genomic data
Outbreaks of tuberculosis (TB) – such as the large isoniazid-resistant outbreak centred on London, UK, which originated in 1995 – provide excellent opportunities to model transmission of this devastating disease. Transmission chains for TB are notoriously difficult to ascertain, but mathematical modelling approaches, combined with whole-genome sequencing data, have strong potential to contribute to transmission analyses. Using such data, we aimed to reconstruct transmission histories for the outbreak using a Bayesian approach, and to use machine-learning techniques with patient-level data to identify the key covariates associated with transmission. By using our transmission reconstruction method that accounts for phylogenetic uncertainty, we are able to identify 21 transmission events with reasonable confidence, 9 of which have zero SNP distance, and a maximum distance of 3. Patient age, alcohol abuse and history of homelessness were found to be the most important predictors of being credible TB transmitters.
-
-
-
Optimized use of Oxford Nanopore flowcells for hybrid assemblies
Hybrid assemblies are highly valuable for studies of Enterobacteriaceae due to their ability to fully resolve the structure of mobile genetic elements, such as plasmids, which are involved in the carriage of clinically important genes (e.g. those involved in antimicrobial resistance/virulence). The widespread application of this technique is currently primarily limited by cost. Recent data have suggested that non-inferior, and even superior, hybrid assemblies can be produced using a fraction of the total output from a multiplexed nanopore [Oxford Nanopore Technologies (ONT)] flowcell run. In this study we sought to determine the optimal minimal running time for flowcells when acquiring reads for hybrid assembly. We then evaluated whether the ONT wash kit might allow users to exploit shorter running times by sequencing multiple libraries per flowcell. After 24 h of sequencing, most chromosomes and plasmids had circularized and there was no benefit associated with longer running times. Quality was similar at 12 h, suggesting that shorter running times are likely to be acceptable for certain applications (e.g. plasmid genomics). The ONT wash kit was highly effective in removing DNA between libraries. Contamination between libraries did not appear to affect subsequent hybrid assemblies, even when the same barcodes were used successively on a single flowcell. Utilizing shorter run times in combination with between-library nuclease washes allows at least 36 Enterobacteriaceae isolates to be sequenced per flowcell, significantly reducing the per-isolate sequencing cost. Ultimately this will facilitate large-scale studies utilizing hybrid assembly, advancing our understanding of the genomics of key human pathogens.
-
-
-
Geographically structured genomic diversity of non-human primate-infecting Treponema pallidum subsp. pertenue
Benjamin Mubemba, Jan F. Gogarten, Verena J. Schuenemann, Ariane Düx, Alexander Lang, Kathrin Nowak, Kamilla Pléh, Ella Reiter, Markus Ulrich, Anthony Agbor, Gregory Brazzola, Tobias Deschner, Paula Dieguez, Anne-Céline Granjon, Sorrel Jones, Jessica Junker, Erin Wessling, Mimi Arandjelovic, Hjalmar Kuehl, Roman M. Wittig, Fabian H. Leendertz and Sébastien Calvignac-SpencerMany non-human primate species in sub-Saharan Africa are infected with Treponema pallidum subsp. pertenue , the bacterium causing yaws in humans. In humans, yaws is often characterized by lesions of the extremities and face, while T. pallidum subsp. pallidum causes venereal syphilis and is typically characterized by primary lesions on the genital, anal or oral mucosae. It remains unclear whether other T. pallidum subspecies found in humans also occur in non-human primates and how the genomic diversity of non-human primate T. pallidum subsp. pertenue lineages is distributed across hosts and space. We observed orofacial and genital lesions in sooty mangabeys (Cercocebus atys) in Taï National Park, Côte d’Ivoire and collected swabs and biopsies from symptomatic animals. We also collected non-human primate bones from 8 species in Taï National Park and 16 species from 11 other sites across sub-Saharan Africa. Samples were screened for T. pallidum DNA using polymerase chain reactions (PCRs) and we used in-solution hybridization capture to sequence T. pallidum genomes. We generated three nearly complete T. pallidum genomes from biopsies and swabs and detected treponemal DNA in bones of six non-human primate species in five countries, allowing us to reconstruct three partial genomes. Phylogenomic analyses revealed that both orofacial and genital lesions in sooty mangabeys from Taï National Park were caused by T. pallidum subsp. pertenue . We showed that T. pallidum subsp. pertenue has infected non-human primates in Taï National Park for at least 28 years and has been present in two non-human primate species that had not been described as T. pallidum subsp. pertenue hosts in this ecosystem, western chimpanzees (Pan troglodytes verus) and western red colobus (Piliocolobus badius), complementing clinical evidence that started accumulating in Taï National Park in 2014. More broadly, simian T. pallidum subsp. pertenue strains did not form monophyletic clades based on host species or the symptoms caused, but rather clustered based on geography. Geographical clustering of T. pallidum subsp. pertenue genomes might be compatible with cross-species transmission of T. pallidum subsp. pertenue within ecosystems or environmental exposure, leading to the acquisition of closely related strains. Finally, we found no evidence for mutations that confer antimicrobial resistance.
-
- Microbial Communities
- Environmental
-
-
Diversity and evolutionary dynamics of spore-coat proteins in spore-forming species of Bacillales
More LessAmong members of the Bacillales order, there are several species capable of forming a structure called an endospore. Endospores enable bacteria to survive under unfavourable growth conditions and germinate when environmental conditions are favourable again. Spore-coat proteins are found in a multilayered proteinaceous structure encasing the spore core and the cortex. They are involved in coat assembly, cortex synthesis and germination. Here, we aimed to determine the diversity and evolutionary processes that have influenced spore-coat genes in various spore-forming species of Bacillales using an in silico approach. For this, we used sequence similarity searching algorithms to determine the diversity of coat genes across 161 genomes of Bacillales. The results suggest that among Bacillales, there is a well-conserved core genome, composed mainly by morphogenetic coat proteins and spore-coat proteins involved in germination. However, some spore-coat proteins are taxa-specific. The best-conserved genes among different species may promote adaptation to changeable environmental conditions. Because most of the Bacillus species harbour complete or almost complete sets of spore-coat genes, we focused on this genus in greater depth. Phylogenetic reconstruction revealed eight monophyletic groups in the Bacillus genus, of which three are newly discovered. We estimated the selection pressures acting over spore-coat genes in these monophyletic groups using classical and modern approaches and detected horizontal gene transfer (HGT) events, which have been further confirmed by scanning the genomes to find traces of insertion sequences. Although most of the genes are under purifying selection, there are several cases with individual sites evolving under positive selection. Finally, the HGT results confirm that sporulation is an ancestral feature in Bacillus .
-
- Microbe-Niche Interactions
- Environmental Niche Adaptation
-
-
In silico analysis of the chemotactic system of Agrobacterium tumefaciens
More LessAgrobacterium tumefaciens is an efficient tool for creating transgenic host plants. The first step in the genetic transformation process involves A. tumefaciens chemotaxis, which is crucial to the survival of A. tumefaciens in changeable, harsh and even contaminated soil environments. However, a systematic study of its chemotactic signalling pathway is still lacking. In this study, the distribution and classification of chemotactic genes in the model A. tumefaciens C58 and 21 other strains were annotated. Local blast was used for comparative genomics, and hmmer was used for predicting protein domains. Chemotactic phenotypes for knockout mutants of ternary signalling complexes in A. tumefaciens C58 were evaluated using a swim agar plate. A major cluster, in which chemotaxis genes were consistently organized as MCP (methyl-accepting chemotaxis protein), CheS, CheY1, CheA, CheR, CheB, CheY2 and CheD, was found in A. tumefaciens , but two coupling CheW proteins were located outside the ‘che’ cluster. In the ternary signalling complexes, the absence of MCP atu0514 significantly impaired A. tumefaciens chemotaxis, and the absence of CheA (atu0517) or the deletion of both CheWs abolished chemotaxis. A total of 465 MCPs were found in the 22 strains, and the cytoplasmic domains of these MCPs were composed of 38 heptad repeats. A high homology was observed between the chemotactic systems of the 22 A. tumefaciens strains with individual differences in the gene and receptor protein distributions, possibly related to their ecological niches. This preliminary study demonstrates the chemotactic system of A. tumefaciens , and provides some reference for A. tumefaciens sensing and chemotaxis to exogenous signals.
-
- Pathogenesis
-
-
Analysis of complete Campylobacter concisus genomes identifies genomospecies features, secretion systems and novel plasmids and their association with severe ulcerative colitis
Campylobacter concisus is an emerging enteric pathogen that is associated with several gastrointestinal diseases, such as inflammatory bowel disease (IBD), which includes Crohn’s disease (CD) and ulcerative colitis (UC). Currently, only three complete C. concisus genomes are available and more complete C. concisus genomes are needed in order to better understand the genomic features and pathogenicity of this emerging pathogen. DNA extracted from 22 C . concisus strains were subjected to Oxford Nanopore genome sequencing. Complete genome assembly was performed using Nanopore genome data in combination with previously reported short-read Illumina data. Genome features of complete C. concisus genomes were analysed using bioinformatic tools. The enteric disease associations of C. concisus plasmids were examined using 239 C . concisus strains and confirmed using PCRs. Proteomic analysis was used to examine T6SS secreted proteins. We successfully obtained 13 complete C. concisus genomes in this study. Analysis of 16 complete C. concisus genomes (3 from public databases) identified multiple novel plasmids. pSma1 plasmid was found to be associated with severe UC. Sec-SRP, Tat and T6SS were found to be the main secretion systems in C. concisus and proteomic data showed a functional T6SS despite the lack of ClpV. T4SS was found in 25% of complete C. concisus genomes. This study also found that GS2 strains had larger genomes and higher GC content than GS1 strains and more often had plasmids. In conclusion, this study provides fundamental genomic data for understanding C. concisus plasmids, genomospecies features, evolution, secretion systems and pathogenicity.
-
-
-
Host adaptation and microbial competition drive Ralstonia solanacearum phylotype I evolution in the Republic of Korea
Bacterial wilt caused by the Ralstonia solanacearum species complex (RSSC) threatens the cultivation of important crops worldwide. We sequenced 30 RSSC phylotype I ( R. pseudosolanacearum ) strains isolated from pepper (Capsicum annuum) and tomato (Solanum lycopersicum) across the Republic of Korea. These isolates span the diversity of phylotype I, have extensive effector repertoires and are subject to frequent recombination. Recombination hotspots among South Korean phylotype I isolates include multiple predicted contact-dependent inhibition loci, suggesting that microbial competition plays a significant role in Ralstonia evolution. Rapid diversification of secreted effectors presents challenges for the development of disease-resistant plant varieties. We identified potential targets for disease resistance breeding by testing for allele-specific host recognition of T3Es present among South Korean phyloype I isolates. The integration of pathogen population genomics and molecular plant pathology contributes to the development of location-specific disease control and development of plant cultivars with durable resistance to relevant threats.
-
- Responses to Human Interventions
- Antibiotics
-
-
Exploration into the origins and mobilization of di-hydrofolate reductase genes and the emergence of clinical resistance to trimethoprim
More LessTrimethoprim is a synthetic antibacterial agent that targets folate biosynthesis by competitively binding to the di-hydrofolate reductase enzyme (DHFR). Trimethoprim is often administered synergistically with sulfonamide, another chemotherapeutic agent targeting the di-hydropteroate synthase (DHPS) enzyme in the same pathway. Clinical resistance to both drugs is widespread and mediated by enzyme variants capable of performing their biological function without binding to these drugs. These mutant enzymes were assumed to have arisen after the discovery of these synthetic drugs, but recent work has shown that genes conferring resistance to sulfonamide were present in the bacterial pangenome millions of years ago. Here, we apply phylogenetics and comparative genomics methods to study the largest family of mobile trimethoprim-resistance genes (dfrA). We show that most of the dfrA genes identified to date map to two large clades that likely arose from independent mobilization events. In contrast to sulfonamide resistance (sul) genes, we find evidence of recurrent mobilization in dfrA genes. Phylogenetic evidence allows us to identify novel dfrA genes in the emerging pathogen Acinetobacter baumannii , and we confirm their resistance phenotype in vitro. We also identify a cluster of dfrA homologues in cryptic plasmid and phage genomes, but we show that these enzymes do not confer resistance to trimethoprim. Our methods also allow us to pinpoint the chromosomal origin of previously reported dfrA genes, and we show that many of these ancient chromosomal genes also confer resistance to trimethoprim. Our work reveals that trimethoprim resistance predated the clinical use of this chemotherapeutic agent, but that novel mutations have likely also arisen and become mobilized following its widespread use within and outside the clinic. Hence, this work confirms that resistance to novel drugs may already be present in the bacterial pangenome, and stresses the importance of rapid mobilization as a fundamental element in the emergence and global spread of resistance determinants.
-
-
-
Azithromycin resistance mutations in Streptococcus pneumoniae as revealed by a chemogenomic screen
More LessWe report on the combination of chemical mutagenesis, azithromycin selection and next-generation sequencing (Mut-Seq) for the identification of small nucleotide variants that decrease the susceptibility of Streptococcus pneumoniae to the macrolide antibiotic azithromycin. Mutations in the 23S ribosomal RNA or in ribosomal proteins can confer resistance to macrolides and these were detected by Mut-Seq. By concentrating on recurrent variants, we could associate mutations in genes implicated in the metabolism of glutamine with decreased azithromycin susceptibility among S. pneumoniae mutants. Glutamine synthetase catalyses the transformation of glutamate and ammonium into glutamine and its chemical inhibition is shown to sensitize S. pneumoniae to antibiotics. A mutation affecting the ribosomal-binding site of a putative ribonuclease J2 is also shown to confer low-level resistance. Mut-Seq has the potential to reveal chromosomal changes enabling high resistance as well as novel events conferring more subtle phenotypes.
-
-
-
Comprehensive screening of genomic and metagenomic data reveals a large diversity of tetracycline resistance genes
Tetracyclines are broad-spectrum antibiotics used to prevent or treat a variety of bacterial infections. Resistance is often mediated through mobile resistance genes, which encode one of the three main mechanisms: active efflux, ribosomal target protection or enzymatic degradation. In the last few decades, a large number of new tetracycline-resistance genes have been discovered in clinical settings. These genes are hypothesized to originate from environmental and commensal bacteria, but the diversity of tetracycline-resistance determinants that have not yet been mobilized into pathogens is unknown. In this study, we aimed to characterize the potential tetracycline resistome by screening genomic and metagenomic data for novel resistance genes. By using probabilistic models, we predicted 1254 unique putative tetracycline resistance genes, representing 195 gene families (<70 % amino acid sequence identity), whereof 164 families had not been described previously. Out of 17 predicted genes selected for experimental verification, 7 induced a resistance phenotype in an Escherichia coli host. Several of the predicted genes were located on mobile genetic elements or in regions that indicated mobility, suggesting that they easily can be shared between bacteria. Furthermore, phylogenetic analysis indicated several events of horizontal gene transfer between bacterial phyla. Our results also suggested that acquired efflux pumps originate from proteobacterial species, while ribosomal protection genes have been mobilized from Firmicutes and Actinobacteria . This study significantly expands the knowledge of known and putatively novel tetracycline resistance genes, their mobility and evolutionary history. The study also provides insights into the unknown resistome and genes that may be encountered in clinical settings in the future.
-
- Systems Microbiology
- Large-scale Comparative Genomics
-
-
The glycan alphabet is not universal: a hypothesis
More LessSeveral monosaccharides constitute naturally occurring glycans, but it is uncertain whether they constitute a universal set like the alphabets of proteins and DNA. Based on the available experimental observations, it is hypothesized herein that the glycan alphabet is not universal. Data on the presence/absence of pathways for the biosynthesis of 55 monosaccharides in 12 939 completely sequenced archaeal and bacterial genomes are presented in support of this hypothesis. Pathways were identified by searching for homologues of biosynthesis pathway enzymes. Substantial variations were observed in the set of monosaccharides used by organisms belonging to the same phylum, genera and even species. Monosaccharides were grouped as common, less common and rare based on their prevalence in Archaea and Bacteria. It was observed that fewer enzymes are sufficient to biosynthesize monosaccharides in the common group. It appears that the common group originated before the formation of the three domains of life. In contrast, the rare group is confined to a few species in a few phyla, suggesting that these monosaccharides evolved much later. Fold conservation, as observed in aminotransferases and SDR (short-chain dehydrogenase reductase) superfamily members involved in monosaccharide biosynthesis, suggests neo- and sub-functionalization of genes led to the formation of the rare group monosaccharides. The non-universality of the glycan alphabet begets questions about the role of different monosaccharides in determining an organism’s fitness.
-
- Transcriptomics, Proteomics, Networks
-
-
Functional genomics reveals the toxin–antitoxin repertoire and AbiE activity in Serratia
Bacteriophage defences are divided into innate and adaptive systems. Serratia sp. ATCC 39006 has three CRISPR-Cas adaptive immune systems, but its innate immune repertoire is unknown. Here, we re-sequenced and annotated the Serratia genome and predicted its toxin–antitoxin (TA) systems. TA systems can provide innate phage defence through abortive infection by causing infected cells to ‘shut down’, limiting phage propagation. To assess TA system function on a genome-wide scale, we utilized transposon insertion and RNA sequencing. Of the 32 TA systems predicted bioinformatically, 4 resembled pseudogenes and 11 were demonstrated to be functional based on transposon mutagenesis. Three functional systems belonged to the poorly characterized but widespread, AbiE, abortive infection/TA family. AbiE is a type IV TA system with a predicted nucleotidyltransferase toxin. To investigate the mode of action of this toxin, we measured the transcriptional response to AbiEii expression. We observed dysregulated levels of tRNAs and propose that the toxin targets tRNAs resulting in bacteriostasis. A recent report on a related toxin shows this occurs through addition of nucleotides to tRNA(s). This study has demonstrated the utility of functional genomics for probing TA function in a high-throughput manner, defined the TA repertoire in Serratia and shown the consequences of AbiE induction.
-
-
-
Transcriptomics reveal core activities of the plant growth-promoting bacterium Delftia acidovorans RAY209 during interaction with canola and soybean roots
The plant growth-promoting rhizobacterium Delftia acidovorans RAY209 is capable of establishing strong root attachment during early plant development at 7 days post-inoculation. The transcriptional response of RAY209 was measured using RNA-seq during early (day 2) and sustained (day 7) root colonization of canola plants, capturing RAY209 differentiation from a medium-suspended cell state to a strongly root-attached cell state. Transcriptomic data was collected in an identical manner during RAY209 interaction with soybean roots to explore the putative root colonization response to this globally relevant crop. Analysis indicated there is an increased number of significantly differentially expressed genes between medium-suspended and root-attached cells during early soybean root colonization relative to sustained colonization, while the opposite temporal pattern was observed for canola root colonization. Regardless of the plant host, root-attached RAY209 cells exhibited the least amount of differential gene expression between early and sustained root colonization. Root-attached cells of either canola or soybean roots expressed high levels of a fasciclin gene homolog encoding an adhesion protein, as well as genes encoding hydrolases, multiple biosynthetic processes, and membrane transport. Notably, while RAY209 ABC transporter genes of similar function were transcribed during attachment to either canola or soybean roots, several transporter genes were uniquely differentially expressed during colonization of the respective plant hosts. In turn, both canola and soybean plants expressed genes encoding pectin lyase and hydrolases – enzymes with purported function in remodelling extracellular matrices in response to RAY209 colonization. RAY209 exhibited both a core regulatory response and a planthost-specific regulatory response to root colonization, indicating that RAY209 specifically adjusts its cellular activities to adapt to the canola and soybean root environments. This transcriptomic data defines the basic RAY209 response as both a canola and soybean commercial crop and seed inoculant.
-
- Genome Annotation, Metabolic Reconstructions
-
-
Ancestral state reconstruction of metabolic pathways across pangenome ensembles
As genome sequencing efforts are unveiling the genetic diversity of the biosphere with an unprecedented speed, there is a need to accurately describe the structural and functional properties of groups of extant species whose genomes have been sequenced, as well as their inferred ancestors, at any given taxonomic level of their phylogeny. Elaborate approaches for the reconstruction of ancestral states at the sequence level have been developed, subsequently augmented by methods based on gene content. While these approaches of sequence or gene-content reconstruction have been successfully deployed, there has been less progress on the explicit inference of functional properties of ancestral genomes, in terms of metabolic pathways and other cellular processes. Herein, we describe PathTrace, an efficient algorithm for parsimony-based reconstructions of the evolutionary history of individual metabolic pathways, pivotal representations of key functional modules of cellular function. The algorithm is implemented as a five-step process through which pathways are represented as fuzzy vectors, where each enzyme is associated with a taxonomic conservation value derived from the phylogenetic profile of its protein sequence. The method is evaluated with a selected benchmark set of pathways against collections of genome sequences from key data resources. By deploying a pangenome-driven approach for pathway sets, we demonstrate that the inferred patterns are largely insensitive to noise, as opposed to gene-content reconstruction methods. In addition, the resulting reconstructions are closely correlated with the evolutionary distance of the taxa under study, suggesting that a diligent selection of target pangenomes is essential for maintaining cohesiveness of the method and consistency of the inference, serving as an internal control for an arbitrary selection of queries. The PathTrace method is a first step towards the large-scale analysis of metabolic pathway evolution and our deeper understanding of functional relationships reflected in emerging pangenome collections.
-
-
-
Identification of the conjugative and mobilizable plasmid fragments in the plasmidome using sequence signatures
More LessPlasmids are the key element in horizontal gene transfer in the microbial community. Recently, a large number of experimental and computational methods have been developed to obtain the plasmidomes of microbial communities. Distinguishing transmissible plasmid sequences, which are derived from conjugative or at least mobilizable plasmids, from non-transmissible plasmid sequences in the plasmidome is essential for understanding the diversity of plasmids and how they regulate the microbial community. Unfortunately, due to the highly fragmented characteristics of DNA sequences in the plasmidome, effective identification methods are lacking. In this work, we used information entropy from information theory to assess the randomness of synonymous codon usage over 4424 plasmid genomes. The results showed that for all amino acids, the choice of a synonymous codon in conjugative and mobilizable plasmids is more random than that in non-transmissible plasmids, indicating that transmissible plasmids have different sequence signatures from non-transmissible plasmids. Inspired by this phenomenon, we further developed a novel algorithm named PlasTrans. PlasTrans takes the triplet code sequences and base sequences of plasmid DNA fragments as input and uses the convolutional neural network of the deep learning technique to further extract the more complex signatures of the plasmid sequences and identify the conjugative and mobilizable DNA fragments. Tests showed that PlasTrans could achieve an AUC of as high as 84–91%, even though the fragments only contained hundreds of base pairs. To the best of our knowledge, this is the first quantitative analysis of the difference in sequence signatures between transmissible and non-transmissible plasmids, and we developed the first tool to perform transferability annotation for DNA fragments in the plasmidome. We expect that PlasTrans will be a useful tool for researchers who analyse the properties of novel plasmids in the microbial community and horizontal gene transfer, especially the spread of resistance genes and virulence factors associated with plasmids. PlasTrans is freely available via https://github.com/zhenchengfang/PlasTrans
-
- Short Communication
-
- Microbial Communities
- Human
-
-
A comprehensive human minimal gut metagenome extends the host’s metabolic potential
More LessAccumulating evidence suggests that humans could be considered as holobionts in which the gut microbiota play essential functions. Initial metagenomic studies reported a pattern of shared genes in the gut microbiome of different individuals, leading to the definition of the minimal gut metagenome as the set of microbial genes necessary for homeostasis and present in all healthy individuals. This study analyses the minimal gut metagenome of the most comprehensive dataset available, including individuals from agriculturalist and industrialist societies, also embodying highly diverse ethnic and geographical backgrounds. The outcome, based on metagenomic predictions for community composition data, resulted in a minimal metagenome comprising 3412 genes, mapping to 1856 reactions and 128 metabolic pathways predicted to occur across all individuals. These results were substantiated by the analysis of two additional datasets describing the microbial community compositions of larger Western cohorts, as well as a substantial shotgun metagenomics dataset. Subsequent analyses showed the plausible metabolic complementarity provided by the minimal gut metagenome to the human genome.
-
- Method
-
- Genomic Methodologies
- Genome-phenotype Association
-
-
Hogwash: three methods for genome-wide association studies in bacteria
More LessBacterial genome-wide association studies (bGWAS) capture associations between genomic variation and phenotypic variation. Convergence-based bGWAS methods identify genomic mutations that occur independently multiple times on the phylogenetic tree in the presence of phenotypic variation more often than is expected by chance. This work introduces hogwash, an open source R package that implements three algorithms for convergence-based bGWAS. Hogwash additionally contains two burden testing approaches to perform gene or pathway analysis to improve power and increase convergence detection for related but weakly penetrant genotypes. To identify optimal use cases, we applied hogwash to data simulated with a variety of phylogenetic signals and convergence distributions. These simulated data are publicly available and contain the relevant metadata regarding convergence and phylogenetic signal for each phenotype and genotype. Hogwash is available for download from GitHub.
-