- Volume 9, Issue 1, 2023
Volume 9, Issue 1, 2023
- Bioresources
-
- Genomic Methodologies
-
-
TransAAP: an automated annotation pipeline for membrane transporter prediction in bacterial genomes
Membrane transporters are a large group of proteins that span cell membranes and contribute to critical cell processes, including delivery of essential nutrients, ejection of waste products, and assisting the cell in sensing environmental conditions. Obtaining an accurate and specific annotation of the transporter proteins encoded by a micro-organism can provide details of its likely nutritional preferences and environmental niche(s), and identify novel transporters that could be utilized in small molecule production in industrial biotechnology. The Transporter Automated Annotation Pipeline (TransAAP) (http://www.membranetransport.org/transportDB2/TransAAP_login.html) is a fully automated web service for the prediction and annotation of membrane transport proteins in an organism from its genome sequence, by using comparisons with both curated databases such as the TCDB (Transporter Classification Database) and TDB, as well as selected Pfams and TIGRFAMs of transporter families and other methodologies. TransAAP was used to annotate transporter genes in the prokaryotic genomes in the National Center for Biotechnology Information (NCBI) RefSeq; these are presented in the transporter database TransportDB (http://www.membranetransport.org) website, which has a suite of data visualization and analysis tools. Creation and maintenance of a bioinformatic database specific for transporters in all genomic datasets is essential for microbiology research groups and the general research/biotechnology community to obtain a detailed picture of membrane transporter systems in various environments, as well as comprehensive information on specific membrane transport proteins.
-
- Pathogens and Epidemiology
-
-
The DataHarmonizer: a tool for faster data harmonization, validation, aggregation and analysis of pathogen genomics contextual information
Ivan S. Gill, Emma J. Griffiths, Damion Dooley, Rhiannon Cameron, Sarah Savić Kallesøe, Nithu Sara John, Anoosha Sehar, Gurinder Gosal, David Alexander, Madison Chapel, Matthew A. Croxen, Benjamin Delisle, Rachelle Di Tullio, Daniel Gaston, Ana Duggan, Jennifer L. Guthrie, Mark Horsman, Esha Joshi, Levon Kearny, Natalie Knox, Lynette Lau, Jason J. LeBlanc, Vincent Li, Pierre Lyons, Keith MacKenzie, Andrew G. McArthur, Emily M. Panousis, John Palmer, Natalie Prystajecky, Kerri N. Smith, Jennifer Tanner, Christopher Townend, Andrea Tyler, Gary Van Domselaar and William W. L. HsiaoPathogen genomics is a critical tool for public health surveillance, infection control, outbreak investigations as well as research. In order to make use of pathogen genomics data, they must be interpreted using contextual data (metadata). Contextual data include sample metadata, laboratory methods, patient demographics, clinical outcomes and epidemiological information. However, the variability in how contextual information is captured by different authorities and how it is encoded in different databases poses challenges for data interpretation, integration and their use/re-use. The DataHarmonizer is a template-driven spreadsheet application for harmonizing, validating and transforming genomics contextual data into submission-ready formats for public or private repositories. The tool’s web browser-based JavaScript environment enables validation and its offline functionality and local installation increases data security. The DataHarmonizer was developed to address the data sharing needs that arose during the COVID-19 pandemic, and was used by members of the Canadian COVID Genomics Network (CanCOGeN) to harmonize SARS-CoV-2 contextual data for national surveillance and for public repository submission. In order to support coordination of international surveillance efforts, we have partnered with the Public Health Alliance for Genomic Epidemiology to also provide a template conforming to its SARS-CoV-2 contextual data specification for use worldwide. Templates are also being developed for One Health and foodborne pathogens. Overall, the DataHarmonizer tool improves the effectiveness and fidelity of contextual data capture as well as its subsequent usability. Harmonization of contextual information across authorities, platforms and systems globally improves interoperability and reusability of data for concerted public health and research initiatives to fight the current pandemic and future public health emergencies. While initially developed for the COVID-19 pandemic, its expansion to other data management applications and pathogens is already underway.
-
- Research Articles
-
- Genomic Methodologies
-
-
Comparison of R9.4.1/Kit10 and R10/Kit12 Oxford Nanopore flowcells and chemistries in bacterial genome reconstruction
Complete, accurate, cost-effective, and high-throughput reconstruction of bacterial genomes for large-scale genomic epidemiological studies is currently only possible with hybrid assembly, combining long- (typically using nanopore sequencing) and short-read (Illumina) datasets. Being able to use nanopore-only data would be a significant advance. Oxford Nanopore Technologies (ONT) have recently released a new flowcell (R10.4) and chemistry (Kit12), which reportedly generate per-read accuracies rivalling those of Illumina data. To evaluate this, we sequenced DNA extracts from four commonly studied bacterial pathogens, namely Escherichia coli , Klebsiella pneumoniae , Pseudomonas aeruginosa and Staphylococcus aureus , using Illumina and ONT’s R9.4.1/Kit10, R10.3/Kit12, R10.4/Kit12 flowcells/chemistries. We compared raw read accuracy and assembly accuracy for each modality, considering the impact of different nanopore basecalling models, commonly used assemblers, sequencing depth, and the use of duplex versus simplex reads. ‘Super accuracy’ (sup) basecalled R10.4 reads - in particular duplex reads - have high per-read accuracies and could be used to robustly reconstruct bacterial genomes without the use of Illumina data. However, the per-run yield of duplex reads generated in our hands with standard sequencing protocols was low (typically <10 %), with substantial implications for cost and throughput if relying on nanopore data only to enable bacterial genome reconstruction. In addition, recovery of small plasmids with the best-performing long-read assembler (Flye) was inconsistent. R10.4/Kit12 combined with sup basecalling holds promise as a singular sequencing technology in the reconstruction of commonly studied bacterial genomes, but hybrid assembly (Illumina+R9.4.1 hac) currently remains the highest throughput, most robust, and cost-effective approach to fully reconstruct these bacterial genomes.
-
- Functional Genomics and Microbe–Niche Interactions
-
-
Biosynthetic novelty index reveals the metabolic potential of rare actinobacteria isolated from highly oligotrophic sediments
Calculations predict that testing of 5 000–10 000 molecules and >1 billion US dollars (£0.8 billion, £1=$1.2) are required for one single drug to come to the market. A solution to this problem is to establish more efficient protocols that reduce the high rate of re-isolation and continuous rediscovery of natural products during early stages of the drug development process. The study of ‘rare actinobacteria’ has emerged as a possible approach for increasing the discovery rate of drug leads from natural sources. Here, we define a simple genomic metric, defined as biosynthetic novelty index (BiNI), that can be used to rapidly rank strains according to the novelty of the subset of encoding biosynthetic clusters. By comparing a subset of high-quality genomes from strains of different taxonomic and ecological backgrounds, we used the BiNI score to support the notion that rare actinobacteria encode more biosynthetic gene cluster (BGC) novelty. In addition, we present the isolation and genomic characterization, focused on specialized metabolites and phenotypic screening, of two isolates belonging to genera Lentzea and Actinokineospora from a highly oligotrophic environment. Our results show that both strains harbour a unique subset of BGCs compared to other members of the genera Lentzea and Actinokineospora . These BGCs are responsible for potent antimicrobial and cytotoxic bioactivity. The experimental data and analysis presented in this study contribute to the knowledge of genome mining analysis in rare actinobacteria and, most importantly, can serve to direct sampling efforts to accelerate early stages of the drug discovery pipeline.
-
- Microbial Communities
-
-
Population genomics of Australian indigenous Mesorhizobium reveals diverse nonsymbiotic genospecies capable of nitrogen-fixing symbioses following horizontal gene transfer
Mesorhizobia are soil bacteria that establish nitrogen-fixing symbioses with various legumes. Novel symbiotic mesorhizobia frequently evolve following horizontal transfer of symbiosis-gene-carrying integrative and conjugative elements (ICESyms) to indigenous mesorhizobia in soils. Evolved symbionts exhibit a wide range in symbiotic effectiveness, with some fixing nitrogen poorly or not at all. Little is known about the genetic diversity and symbiotic potential of indigenous soil mesorhizobia prior to ICESym acquisition. Here we sequenced genomes of 144 Mesorhizobium spp. strains cultured directly from cultivated and uncultivated Australian soils. Of these, 126 lacked symbiosis genes. The only isolated symbiotic strains were either exotic strains used previously as legume inoculants, or indigenous mesorhizobia that had acquired exotic ICESyms. No native symbiotic strains were identified. Indigenous nonsymbiotic strains formed 22 genospecies with phylogenomic diversity overlapping the diversity of internationally isolated symbiotic Mesorhizobium spp. The genomes of indigenous mesorhizobia exhibited no evidence of prior involvement in nitrogen-fixing symbiosis, yet their core genomes were similar to symbiotic strains and they generally lacked genes for synthesis of biotin, nicotinate and thiamine. Genomes of nonsymbiotic mesorhizobia harboured similar mobile elements to those of symbiotic mesorhizobia, including ICESym-like elements carrying aforementioned vitamin-synthesis genes but lacking symbiosis genes. Diverse indigenous isolates receiving ICESyms through horizontal gene transfer formed effective symbioses with Lotus and Biserrula legumes, indicating most nonsymbiotic mesorhizobia have an innate capacity for nitrogen-fixing symbiosis following ICESym acquisition. Non-fixing ICESym-harbouring strains were isolated sporadically within species alongside effective symbionts, indicating chromosomal lineage does not predict symbiotic potential. Our observations suggest previously observed genomic diversity amongst symbiotic Mesorhizobium spp. represents a fraction of the extant diversity of nonsymbiotic strains. The overlapping phylogeny of symbiotic and nonsymbiotic clades suggests major clades of Mesorhizobium diverged prior to introduction of symbiosis genes and therefore chromosomal genes involved in symbiosis have evolved largely independent of nitrogen-fixing symbiosis.
-
- Pathogens and Epidemiology
-
-
The genomic characterization of Salmonella Paratyphi A from an outbreak of enteric fever in Vadodara, India
Salmonella enterica Typhi (S. Typhi) and Paratyphi A (S. Paratyphi A) are the causative agents of enteric fever, a systemic human disease with a burden of 300 000 cases per year in India. The majority of enteric fever cases are associated with S. Typhi, resulting in a paucity of data regarding S. Paratyphi A, specifically with respect to genomic surveillance and antimicrobial resistance (AMR). Here, we exploited whole-genome sequencing (WGS) to identify S. Paratyphi A genotypes and AMR determinants associated with an outbreak of S. Paratyphi A in Vadodara, India, from December 2018 to December 2019. In total 117 S. Paratyphi A were isolated and genome sequenced, most were genotype 2.4.2 (72.6 % of all cases), which is the globally dominant genotype. The remainder were genotype 2.3 (25.6 %), while only two isolates belonged to genotype 2.4.1. A single base-pair mutation in gyrA, associated with reduced susceptibility to fluoroquinolones, was present in all of the outbreak isolates; with 74.35 % of isolates having a S83F substitution and the remainder having an S83Y substitution. Our surveillance study suggests that S. Paratyphi A is an emergent pathogen in South Asia, which may become increasingly relevant with the introduction of Vi conjugate vaccines.
-
-
-
A bittersweet fate: detection of serotype switching in Pseudomonas aeruginosa
More LessHigh-risk clone types in Pseudomonas aeruginosa are problematic global multidrug-resistant clones. However, apart from their ability to resist antimicrobial treatment, not much is known about what sets these clones apart from the multitude of other clones. In high-risk clone ST111, it has previously been shown that replacement of the native serotype biosynthetic gene cluster (O4) by a different gene cluster (O12) by horizontal gene transfer and recombination may have contributed to the global success of this clone. However, the extent to which isolates undergo this type of serotype switching has not been adequately explored in P. aeruginosa . In the present study, a bioinformatics tool has been developed and utilized to provide a first estimate of serotype switching in groups of multidrug resistant (MDR) clinical isolates. The tool detects serotype switching by analysis of core-genome phylogeny and in silico serotype. Analysis of a national survey of MDR isolates found a prevalence of 3.9 % of serotype-switched isolates in high-risk clone types ST111, ST244 and ST253. A global survey of MDR isolates was additionally analysed, and it was found that 2.3 % of isolates had undergone a serotype switch. To further understand this process, we determined the exact boundaries of the horizontally transferred serotype O12 island. We found that the size of the serotype island correlates with the clone type of the receiving isolate and additionally we found intra-clone type variations in size and boundaries. This suggests multiple serotype switch events. Moreover, we found that the housekeeping gene gyrA is co-transferred with the O12 serotype island, which prompted us to analyse this allele for all serotype O12 isolates. We found that 95 % of ST111 O12 isolates had a resistant gyrA allele and 86 % of all O12 isolates had a resistant gyrA allele. The rates of resistant gyrA alleles in isolates with other prevalent serotypes are all lower. Together, these results show that the transfer and acquisition of serotype O12 in high-risk clone ST111 has happened multiple times and may be facilitated by multiple donors, which clearly suggests a strong selection pressure for this process. However, gyrA-mediated antibiotic resistance may not be the only evolutionary driver.
-
-
-
Using a combination of short- and long-read sequencing to investigate the diversity in plasmid- and chromosomally encoded extended-spectrum beta-lactamases (ESBLs) in clinical Shigella and Salmonella isolates in Belgium
For antimicrobial resistance (AMR) surveillance, it is important not only to detect AMR genes, but also to determine their plasmidic or chromosomal location, as this will impact their spread differently. Whole-genome sequencing (WGS) is increasingly used for AMR surveillance. However, determining the genetic context of AMR genes using only short-read sequencing is complicated. The combination with long-read sequencing offers a potential solution, as it allows hybrid assemblies. Nevertheless, its use in surveillance has so far been limited. This study aimed to demonstrate its added value for AMR surveillance based on a case study of extended-spectrum beta-lactamases (ESBLs). ESBL genes have been reported to occur also on plasmids. To gain insight into the diversity and genetic context of ESBL genes detected in clinical isolates received by the Belgian National Reference Center between 2013 and 2018, 100 ESBL-producing Shigella and 31 ESBL-producing Salmonella were sequenced with MiSeq and a representative selection of 20 Shigella and six Salmonella isolates additionally with MinION technology, allowing hybrid assembly. The bla CTX-M-15 gene was found to be responsible for a rapid rise in the ESBL Shigella phenotype from 2017. This gene was mostly detected on multi-resistance-carrying IncFII plasmids. Based on clustering, these plasmids were determined to be distinct from the circulating plasmids before 2017. They were spread to different Shigella species and within Shigella sonnei between multiple genotypes. Another similar IncFII plasmid was detected after 2017 containing bla CTX-M-27 for which only clonal expansion occurred. Matches of up to 99 % to plasmids of various bacterial hosts from all over the world were found, but global alignments indicated that direct or recent ESBL-plasmid transfers did not occur. It is most likely that travellers introduced these in Belgium and subsequently spread them domestically. However, a clear link to a specific country could not be made. Moreover, integration of bla CTX-M in the chromosome of two Shigella isolates was determined for the first time, and shown to be related to ISEcp1. In contrast, in Salmonella , ESBL genes were only found on plasmids, of which bla CTX-M-55 and IncHI2 were the most prevalent, respectively. No matching ESBL plasmids or cassettes were detected between clinical Shigella and Salmonella isolates. The hybrid assembly data allowed us to check the accuracy of plasmid prediction tools. MOB-suite showed the highest accuracy. However, these tools cannot replace the accuracy of long-read and hybrid assemblies. This study illustrates the added value of hybrid assemblies for AMR surveillance and shows that a strategy where even just representative isolates of a collection used for hybrid assemblies could improve international AMR surveillance as it allows plasmid tracking.
-
-
-
Split k-mer analysis compared to cgMLST and SNP-based core genome analysis for detecting transmission of vancomycin-resistant enterococci: results from routine outbreak analyses across different hospitals and hospitals networks in Berlin, Germany
The increase of Vancomycin-resistant Enterococcus faecium (VREfm) in recent years has been partially attributed to the rise of specific clonal lineages, which have been identified throughout Germany. To date, there is no gold standard for the interpretation of genomic data for outbreak analyses. New genomic approaches such as split k-mer analysis (SKA) could support cluster attribution for routine outbreak investigation. The aim of this project was to investigate frequent clonal lineages of VREfm identified during suspected outbreaks across different hospitals, and to compare genomic approaches including SKA in routine outbreak investigation. We used routine outbreak laboratory data from seven hospitals and three different hospital networks in Berlin, Germany. Short-read libraries were sequenced on the Illumina MiSeq system. We determined clusters using the published Enterococcus faecium -cgMLST scheme (threshold ≤20 alleles), and assigned sequence and complex types (ST, CT), using the Ridom SeqSphere+ software. For each cluster as determined by cgMLST, we used pairwise core-genome SNP-analysis and SKA at thresholds of ten and seven SNPs, respectively, to further distinguish cgMLST clusters. In order to investigate clinical relevance, we analysed to what extent epidemiological linkage backed the clusters determined with different genomic approaches. Between 2014 and 2021, we sequenced 693 VREfm strains, and 644 (93 %) were associated within cgMLST clusters. More than 74 % (n=475) of the strains belonged to the six largest cgMLST clusters, comprising ST117, ST78 and ST80. All six clusters were detected across several years and hospitals without apparent epidemiological links. Core SNP analysis identified 44 clusters with a median cluster size of three isolates (IQR 2–7, min-max 2–63), as well as 197 singletons (41.4 % of 475 isolates). SKA identified 67 clusters with a median cluster size of two isolates (IQR 2–4, min-max 2–19), and 261 singletons (54.9 % of 475 isolates). Of the isolate pairs attributed to clusters, 7 % (n=3064/45 596) of pairs in clusters determined by standard cgMLST, 15 % (n=1222/8500) of pairs in core SNP-clusters and 51 % (n=942/1880) of pairs in SKA-clusters showed epidemiological linkage. The proportion of epidemiological linkage differed between sequence types. For VREfm, the discriminative ability of the widely used cgMLST based approach at ≤20 alleles difference was insufficient to rule out hospital outbreaks without further analytical methods. Cluster assignment guided by core genome SNP analysis and the reference free SKA was more discriminative and correlated better with obvious epidemiological linkage, at least recently published thresholds (ten and seven SNPs, respectively) and for frequent STs. Besides higher overall discriminative power, the whole-genome approach implemented in SKA is also easier and faster to conduct and requires less computational resources.
-
- Short Communications
-
- Pathogens and Epidemiology
-
-
Multiplex MinION sequencing suggests enteric adenovirus F41 genetic diversity comparable to pre-COVID-19 era
Human adenovirus F41 causes acute gastroenteritis in children, and has recently been associated with an apparent increase in paediatric hepatitis of unknown aetiology in the UK, with further cases reported in multiple countries. Relatively little is known about the genetic diversity of adenovirus F41 in UK children; and it is unclear what, if any, impact the COVID-19 pandemic has had on viral diversity in the UK. Methods that allow F41 to be sequenced from clinical samples without the need for viral culture are required to provide the genomic data to address these questions. Therefore, we evaluated an overlapping-amplicon method of sequencing adenovirus genomes from clinical samples using Oxford Nanopore technology. We applied this method to a small sample of adenovirus-species-F-positive extracts collected as part of standard care in the East of England region in January–May 2022. This method produced genomes with >75 % coverage in 13/22 samples and >50 % coverage in 19/22 samples. We identified two F41 lineages present in paediatric patients in the East of England in 2022. Where F41 genomes from paediatric hepatitis cases were available (n=2), these genomes fell within the diversity of F41 from the UK and continental Europe sequenced before and after the 2020–2021 phase of the COVID-19 pandemic. Our analyses suggest that overlapping amplicon sequencing is an appropriate method for generating F41 genomic data from high-virus-load clinical samples, and currently circulating F41 viral lineages were present in the UK and Europe before the COVID-19 pandemic.
-
-
-
Multidrug-resistant toxigenic Corynebacterium diphtheriae sublineage 453 with two novel resistance genomic islands
More LessAntimicrobial therapy is important for case management of diphtheria, but knowledge on the emergence of multidrug-resistance in Corynebacterium diphtheriae is scarce. We report on the genomic features of two multidrug-resistant toxigenic isolates sampled from wounds in France 3 years apart. Both isolates were resistant to spiramycin, clindamycin, tetracycline, kanamycin and trimethoprim-sulfamethoxazole. Genes ermX, cmx, aph(3’)-Ib, aph(6)-Id, aph(3’)-Ic, aadA1, dfrA15, sul1, cmlA, cmlR and tet(33) were clustered in two genomic islands, one consisting of two transposons and one integron, the other being flanked by two IS6100 insertion sequences. One isolate additionally presented mutations in gyrA and rpoB and was resistant to ciprofloxacin and rifampicin. Both isolates belonged to sublineage 453 (SL453), together with 25 isolates from 11 other countries (https://bigsdb.pasteur.fr/diphtheria/). SL453 is a cosmopolitan toxigenic sublineage of C. diphtheriae, a subset of which acquired multidrug resistance. Even though penicillin, amoxicillin and erythromycin, recommended as the first line in the treatment of diphtheria, remain active, surveillance of diphtheria should consider the risk of dissemination of multidrug-resistant strains and their genetic elements.
-
- Evolution and Responses to Interventions
-
-
Analysis of variola virus molecular evolution suggests an old origin of the virus consistent with historical records
More LessArchaeovirology efforts provided a rich portrait of the evolutionary history of variola virus (VARV, the cause of smallpox), which was characterized by lineage extinctions and a relatively recent origin of the virus as a human pathogen (~1700 years ago, ya). This contrasts with historical records suggesting the presence of smallpox as early as 3500 ya. By performing an analysis of ancestry components in modern, historic, and ancient genomes, we unveil the progressive drifting of VARV lineages from a common ancestral population and we show that a small proportion of Viking Age ancestry persisted until the 18th century. After the split of the P-I and P-II lineages, the former experienced a severe bottleneck. With respect to the emergence of VARV as a human pathogen, we revise time estimates by accounting for the time-dependent rate phenomenon. We thus estimate that VARV emerged earlier than 3800 ya, supporting its presence in ancient societies, as pockmarked Egyptian mummies suggest.
-
- Methods
-
- Genomic Methodologies
-
-
Leveraging comparative genomics to uncover alien genes in bacterial genomes
More LessA significant challenge in bacterial genomics is to catalogue genes acquired through the evolutionary process of horizontal gene transfer (HGT). Both comparative genomics and sequence composition-based methods have often been invoked to quantify horizontally acquired genes in bacterial genomes. Comparative genomics methods rely on completely sequenced genomes and therefore the confidence in their predictions increases as the databases become more enriched in completely sequenced genomes. Recent developments including in microbial genome sequencing call for reassessment of alien genes based on information-rich resources currently available. We revisited the comparative genomics approach and developed a new algorithm for alien gene detection. Our algorithm compared favourably with the existing comparative genomics-based methods and is capable of detecting both recent and ancient transfers. It can be used as a standalone tool or in concert with other complementary algorithms for comprehensively cataloguing alien genes in bacterial genomes.
-