-
Volume 10,
Issue 4,
2024
Volume 10, Issue 4, 2024
- Reviews
-
- Metagenomics and Microbiomes
-
-
Deep learning methods in metagenomics: a review
More LessThe ever-decreasing cost of sequencing and the growing potential applications of metagenomics have led to an unprecedented surge in data generation. One of the most prevalent applications of metagenomics is the study of microbial environments, such as the human gut. The gut microbiome plays a crucial role in human health, providing vital information for patient diagnosis and prognosis. However, analysing metagenomic data remains challenging due to several factors, including reference catalogues, sparsity and compositionality. Deep learning (DL) enables novel and promising approaches that complement state-of-the-art microbiome pipelines. DL-based methods can address almost all aspects of microbiome analysis, including novel pathogen detection, sequence classification, patient stratification and disease prediction. Beyond generating predictive models, a key aspect of these methods is also their interpretability. This article reviews DL approaches in metagenomics, including convolutional networks, autoencoders and attention-based models. These methods aggregate contextualized data and pave the way for improved patient care and a better understanding of the microbiome’s key role in our health.
-
- Research Articles
-
- Genomic Methodologies
-
-
Identifying the best PCR enzyme for library amplification in NGS
More LessBackground. PCR amplification is a necessary step in many next-generation sequencing (NGS) library preparation methods [ 1, 2 ]. Whilst many PCR enzymes are developed to amplify single targets efficiently, accurately and with specificity, few are developed to meet the challenges imposed by NGS PCR, namely unbiased amplification of a wide range of different sizes and GC content. As a result PCR amplification during NGS library prep often results in bias toward GC neutral and smaller fragments. As NGS has matured, optimized NGS library prep kits and polymerase formulations have emerged and in this study we have tested a wide selection of available enzymes for both short-read Illumina library preparation and long fragment amplification ahead of long-read sequencing.
We tested over 20 different hi-fidelity PCR enzymes/NGS amplification mixes on a range of Illumina library templates of varying GC content and composition, and find that both yield and genome coverage uniformity characteristics of the commercially available enzymes varied dramatically. Three enzymes Quantabio RepliQa Hifi Toughmix, Watchmaker Library Amplification Hot Start Master Mix (2X) ‘Equinox’ and Takara Ex Premier were found to give a consistent performance, over all genomes, that mirrored closely that observed for PCR-free datasets. We also test a range of enzymes for long-read sequencing by amplifying size fractionated S. cerevisiae DNA of average size 21.6 and 13.4 kb, respectively.
The enzymes of choice for short-read (Illumina) library fragment amplification are Quantabio RepliQa Hifi Toughmix, Watchmaker Library Amplification Hot Start Master Mix (2X) ‘Equinox’ and Takara Ex Premier, with RepliQa also being the best performing enzyme from the enzymes tested for long fragment amplification prior to long-read sequencing.
-
-
-
A comparative genomics approach reveals a local genetic signature of Leishmania tropica in Morocco
In Morocco, cutaneous leishmaniasis (CL) caused by Leishmania (L.) tropica is an important health problem. Despite the high incidence of CL in the country, the genomic heterogeneity of these parasites is still incompletely understood. In this study, we sequenced the genomes of 14 Moroccan isolates of L. tropica collected from confirmed cases of CL to investigate their genomic heterogeneity. Comparative genomics analyses were conducted by applying the recently established Genome Instability Pipeline (GIP), which allowed us to conduct phylogenomic and principal components analyses (PCA), and to assess genomic variations at the levels of the karyotype, gene copy number, single nucleotide polymorphisms (SNPs) and small insertions/deletions (INDELs) variants. Read-depth analyses revealed a mostly disomic karyotype, with the exception of the stable tetrasomy of chromosome 31. In contrast, we identified important gene copy number variations across all isolates, which affect known virulence genes and thus were probably selected in the field. SNP-based cluster analysis of the 14 isolates revealed a core group of 12 strains that formed a tight cluster and shared 45.1 % (87 751) of SNPs, as well as two strains (M3015, Ltr_16) that clustered separately from each other and the core group, suggesting the circulation of genetically highly diverse strains in Morocco. Phylogenetic analysis, which compared our 14 L. tropica isolates against 40 published genomes of L. tropica from a diverse array of locations, confirmed the genetic difference of our Moroccan isolates from all other isolates examined. In conclusion, our results indicate potential regional variations in SNP profiles that may differentiate Moroccan L. tropica from other L. tropica strains circulating in endemic countries in the Middle East. Our report paves the way for future research with a larger number of strains that will allow correlation of diverse phenotypes (resistance to treatments, virulence) and origins (geography, host species, year of isolation) to defined genomic signals such as gene copy number variations or SNP profiles that may represent interesting biomarker candidates
-
-
-
Nanopore and Illumina sequencing reveal different viral populations from human gut samples
The advent of viral metagenomics, or viromics, has improved our knowledge and understanding of global viral diversity. High-throughput sequencing technologies enable explorations of the ecological roles, contributions to host metabolism, and the influence of viruses in various environments, including the human intestinal microbiome. However, bacterial metagenomic studies frequently have the advantage. The adoption of advanced technologies like long-read sequencing has the potential to be transformative in refining viromics and metagenomics. Here, we examined the effectiveness of long-read and hybrid sequencing by comparing Illumina short-read and Oxford Nanopore Technology (ONT) long-read sequencing technologies and different assembly strategies on recovering viral genomes from human faecal samples. Our findings showed that if a single sequencing technology is to be chosen for virome analysis, Illumina is preferable due to its superior ability to recover fully resolved viral genomes and minimise erroneous genomes. While ONT assemblies were effective in recovering viral diversity, the challenges related to input requirements and the necessity for amplification made it less ideal as a standalone solution. However, using a combined, hybrid approach enabled a more authentic representation of viral diversity to be obtained within samples.
-
-
-
Genome sequences of the first Autographiviridae phages infecting marine Roseobacter
The ubiquitous and abundant marine phages play critical roles in shaping the composition and function of bacterial communities, impacting biogeochemical cycling in marine ecosystems. Autographiviridae is among the most abundant and ubiquitous phage families in the ocean. However, studies on the diversity and ecology of Autographiviridae phages in marine environments are restricted to isolates that infect SAR11 bacteria and cyanobacteria. In this study, ten new roseophages that infect marine Roseobacter strains were isolated from coastal waters. These new roseophages have a genome size ranging from 38 917 to 42 634 bp and G+C content of 44.6–50 %. Comparative genomics showed that they are similar to known Autographiviridae phages regarding gene content and architecture, thus representing the first Autographiviridae roseophages. Phylogenomic analysis based on concatenated conserved genes showed that the ten roseophages form three distinct subgroups within the Autographiviridae, and sequence analysis revealed that they belong to eight new genera. Finally, viromic read-mapping showed that these new Autographiviridae phages are widely distributed in global oceans, mostly inhabiting polar and estuarine locations. This study has expanded the current understanding of the genomic diversity, evolution and ecology of Autographiviridae phages and roseophages. We suggest that Autographiviridae phages play important roles in the mortality and community structure of roseobacters, and have broad ecological applications.
-
- Functional Genomics and Microbe–Niche Interactions
-
-
Chromosome-scale assembly of the streamlined picoeukaryote Picochlorum sp. SENEW3 genome reveals Rabl-like chromatin structure and potential for C4 photosynthesis
Genome sequencing and assembly of the photosynthetic picoeukaryotic Picochlorum sp. SENEW3 revealed a compact genome with a reduced gene set, few repetitive sequences, and an organized Rabl-like chromatin structure. Hi-C chromosome conformation capture revealed evidence of possible chromosomal translocations, as well as putative centromere locations. Maintenance of a relatively few selenoproteins, as compared to similarly sized marine picoprasinophytes Mamiellales, and broad halotolerance compared to others in Trebouxiophyceae, suggests evolutionary adaptation to variable salinity environments. Such adaptation may have driven size and genome minimization and have been enabled by the retention of a high number of membrane transporters. Identification of required pathway genes for both CAM and C4 photosynthetic carbon fixation, known to exist in the marine mamiellale pico-prasinophytes and seaweed Ulva, but few other chlorophyte species, further highlights the unique adaptations of this robust alga. This high-quality assembly provides a significant advance in the resources available for genomic investigations of this and other photosynthetic picoeukaryotes.
-
-
-
Ecology shapes the genomic and biosynthetic diversification of Streptomyces bacteria from insectivorous bats
Streptomyces are prolific producers of secondary metabolites from which many clinically useful compounds have been derived. They inhabit diverse habitats but have rarely been reported in vertebrates. Here, we aim to determine to what extent the ecological source (bat host species and cave sites) influence the genomic and biosynthetic diversity of Streptomyces bacteria. We analysed draft genomes of 132 Streptomyces isolates sampled from 11 species of insectivorous bats from six cave sites in Arizona and New Mexico, USA. We delineated 55 species based on the genome-wide average nucleotide identity and core genome phylogenetic tree. Streptomyces isolates that colonize the same bat species or inhabit the same site exhibit greater overall genomic similarity than they do with Streptomyces from other bat species or sites. However, when considering biosynthetic gene clusters (BGCs) alone, BGC distribution is not structured by the ecological or geographical source of the Streptomyces that carry them. Each genome carried between 19–65 BGCs (median=42.5) and varied even among members of the same Streptomyces species. Nine major classes of BGCs were detected in ten of the 11 bat species and in all sites: terpene, non-ribosomal peptide synthetase, polyketide synthase, siderophore, RiPP-like, butyrolactone, lanthipeptide, ectoine, melanin. Finally, Streptomyces genomes carry multiple hybrid BGCs consisting of signature domains from two to seven distinct BGC classes. Taken together, our results bring critical insights to understanding Streptomyces-bat ecology and BGC diversity that may contribute to bat health and in augmenting current efforts in natural product discovery, especially from underexplored or overlooked environments.
-
-
-
Comparative genomics of a novel Erwinia species associated with the Highland midge (Culicoides impunctatus)
More LessErwinia (Enterobacterales: Erwiniaceae) are a group of cosmopolitan bacteria best known as the causative agents of various plant diseases. However, other species in this genus have been found to play important roles as insect endosymbionts supplementing the diet of their hosts. Here, I describe Candidatus Erwinia impunctatus (Erwimp) associated with the Highland midge Culicoides impunctatus (Diptera: Ceratopogonidae), an abundant biting pest in the Scottish Highlands. The genome of this new Erwinia species was assembled using hybrid long and short read techniques, and a comparative analysis was undertaken with other members of the genus to understand its potential ecological niche and impact. Genome composition analysis revealed that Erwimp is similar to other endophytic and ectophytic species in the genus and is unlikely to be restricted to its insect host. Evidence for an additional plant host includes the presence of a carotenoid synthesis operon implicated as a virulence factor in plant-associated members in the sister genus Pantoea. Unique features of Erwimp include several copies of intimin-like proteins which, along with signs of genome pseudogenization and a loss of certain metabolic pathways, suggests an element of host restriction seen elsewhere in the genus. Furthermore, a screening of individuals over two field seasons revealed the absence of the bacteria in Culicoides impunctatus during the second year indicating this microbe-insect interaction is likely to be transient. These data suggest that Culicoides impunctatus may have an important role to play beyond a biting nuisance, as an insect vector transmitting Erwimp alongside any conferred impacts to surrounding biota.
-
- Pathogens and Epidemiology
-
-
Investigating the impact of insertion sequences and transposons in the genomes of the most significant phytopathogenic bacteria
Genetic variability in phytopathogens is one of the main problems encountered for effective plant disease control. This fact may be related to the presence of transposable elements (TEs), but little is known about their role in host genomes. Here, we performed the most comprehensive analysis of insertion sequences (ISs) and transposons (Tns) in the genomes of the most important bacterial plant pathogens. A total of 35 692 ISs and 71 transposons were identified in 270 complete genomes. The level of pathogen–host specialization was found to be a significant determinant of the element distribution among the species. Some Tns were identified as carrying virulence factors, such as genes encoding effector proteins of the type III secretion system and resistance genes for the antimicrobial streptomycin. Evidence for IS-mediated ectopic recombination was identified in Xanthomonas genomes. Moreover, we found that IS elements tend to be inserted in regions near virulence and fitness genes, such ISs disrupting avirulence genes in X. oryzae genomes. In addition, transcriptome analysis under different stress conditions revealed differences in the expression of genes encoding transposases in the Ralstonia solanacearum, X. oryzae, and P. syringae species. Lastly, we also investigated the role of Tns in regulation via small noncoding regulatory RNAs and found these elements may target plant-cell transcriptional activators. Taken together, the results indicate that TEs may have a fundamental role in variability and virulence in plant pathogenic bacteria.
-
-
-
Patterns recovered in phylogenomic analysis of Candida auris and close relatives implicate broad environmental flexibility in Candida/Clavispora clade yeasts
More LessFungal pathogens commonly originate from benign or non-pathogenic strains living in the natural environment. The recently emerged human pathogen, Candida auris, is one example of a fungus believed to have originated in the environment and recently transitioned into a clinical setting. To date, however, there is limited evidence about the origins of this species in the natural environment and when it began associating with humans. One approach to overcome this gap is to reconstruct phylogenetic relationships between (1) strains isolated from clinical and non-clinical environments and (2) between species known to cause disease in humans and benign environmental saprobes. C. auris belongs to the Candida/Clavispora clade, a diverse group of 45 yeast species including human pathogens and environmental saprobes. We present a phylogenomic analysis of the Candida/Clavispora clade aimed at understanding the ecological breadth and evolutionary relationships between an expanded sample of environmentally and clinically isolated yeasts. To build a robust framework for investigating these relationships, we developed a whole-genome sequence dataset of 108 isolates representing 18 species, including four newly sequenced species and 18 environmentally isolated strains. Our phylogeny, based on 619 orthologous genes, shows environmentally isolated species and strains interspersed with clinically isolated counterparts, suggesting that there have been many transitions between humans and the natural environment in this clade. Our findings highlight the breadth of environments these yeasts inhabit and imply that many clinically isolated yeasts in this clade could just as easily live outside the human body in diverse natural environments and vice versa.
-
-
-
Evaluating the impact of genomic epidemiology of methicillin-resistant Staphylococcus aureus (MRSA) on hospital infection prevention and control decisions
Genomic epidemiology enhances the ability to detect and refute methicillin-resistant Staphylococcus aureus (MRSA) outbreaks in healthcare settings, but its routine introduction requires further evidence of benefits for patients and resource utilization. We performed a 12 month prospective study at Cambridge University Hospitals NHS Foundation Trust in the UK to capture its impact on hospital infection prevention and control (IPC) decisions. MRSA-positive samples were identified via the hospital microbiology laboratory between November 2018 and November 2019. We included samples from in-patients, clinic out-patients, people reviewed in the Emergency Department and healthcare workers screened by Occupational Health. We sequenced the first MRSA isolate from 823 consecutive individuals, defined their pairwise genetic relatedness, and sought epidemiological links in the hospital and community. Genomic analysis of 823 MRSA isolates identified 72 genetic clusters of two or more isolates containing 339/823 (41 %) of the cases. Epidemiological links were identified between two or more cases for 190 (23 %) individuals in 34/72 clusters. Weekly genomic epidemiology updates were shared with the IPC team, culminating in 49 face-to-face meetings and 21 written communications. Seventeen clusters were identified that were consistent with hospital MRSA transmission, discussion of which led to additional IPC actions in 14 of these. Two outbreaks were also identified where transmission had occurred in the community prior to hospital presentation; these were escalated to relevant IPC teams. We identified 38 instances where two or more in-patients shared a ward location on overlapping dates but carried unrelated MRSA isolates (pseudo-outbreaks); research data led to de-escalation of investigations in six of these. Our findings provide further support for the routine use of genomic epidemiology to enhance and target IPC resources.
-
- Short Communications
-
- Pathogens and Epidemiology
-
-
Whole-genome assembly of a novel invertebrate herpesvirus from the gastropod Babylonia areolata
More LessMolluscan herpesviruses cause disease in species of major importance to aquaculture and are the only known herpesviruses to infect invertebrates, which lack an adaptive immune system. Understanding the evolution of malacoherpesviruses in relation to their hosts will likely require comparative genomic studies on multiple phylogenetic scales. Currently, only two malacoherpesvirus species have genomes that have been fully assembled, which limits the ability to perform comparative genomic studies on this family of viruses. In the present study, we fully assemble a herpesvirus from Illumina and Nanopore sequence data that were previously used to assemble the genome of the gastropod Babylonia areolata. We tentatively assign this novel herpesvirus to the genus Aurivirus within the family Malacoherpesviridae based on a phylogenetic analysis of DNA polymerase. While structurally similar to other malacoherpesvirus genomes, a synteny analysis of the novel herpesvirus with another Aurivirus species indicates that genomic rearrangements might be an important process in the evolution of this genus. We anticipate that future complete assemblies of malacoherpesviruses will be a valuable resource in comparative herpesvirus research.
-
- Methods
-
- Genomic Methodologies
-
-
Genome-wide annotation of transcript boundaries using bacterial Rend-seq datasets
More LessAccurate annotation to single-nucleotide resolution of the transcribed regions in genomes is key to optimally analyse RNA-seq data, understand regulatory events and for the design of experiments. However, currently most genome annotations provided by GenBank generally lack information about untranslated regions. Additionally, information regarding genomic locations of non-coding RNAs, such as sRNAs, or anti-sense RNAs is frequently missing. To provide such information, diverse RNA-seq technologies, such as Rend-seq, have been developed and applied to many bacterial species. However, incorporating this vast amount of information into annotation files has been limited and is bioinformatically challenging, resulting in UTRs and other non-coding elements being overlooked or misrepresented. To overcome this problem, we present pyRAP (python Rend-seq Annotation Pipeline), a software package that analyses Rend-seq datasets to accurately resolve transcript boundaries genome-wide. We report the use of pyRAP to find novel transcripts, transcript isoforms, and RNase-dependent sRNA processing events. In Bacillus subtilis we uncovered 63 novel transcripts and provide genomic coordinates with single-nucleotide resolution for 2218 5′UTRs, 1864 3′UTRs and 161 non-coding RNAs. In Escherichia coli, we report 117 novel transcripts, 2429 5′UTRs, 1619 3′UTRs and 91 non-coding RNAs, and in Staphylococcus aureus, 16 novel transcripts, 664 5′UTRs, 696 3′UTRs, and 81 non-coding RNAs. Finally, we use pyRAP to produce updated annotation files for B. subtilis 168, E. coli K-12 MG1655, and S. aureus 8325 for use in the wider microbial genomics research community.
-
Most Read This Month
