- Volume 7, Issue 11, 2021
Volume 7, Issue 11, 2021
- Outbreak Reports
-
- Pathogens and Epidemiology
-
-
Insights on the SARS-CoV-2 genome variability: the lesson learned in Brazil and its impacts on the future of pandemics
Since the beginning of the SARS-CoV-2 spread in Brazil, few studies have been published analysing the variability of viral genome. Herein, we described the dynamic of SARS-CoV-2 strains circulating in Brazil from May to September 2020, to better understand viral changes that may affect the ongoing pandemic. Our data demonstrate that some of the mutations identified are currently observed in variants of interest and variants of concern, and emphasize the importance of studying previous periods in order to comprehend the emergence of new variants. From 720 SARS-CoV-2 genome sequences, we found few sites under positive selection pressure, such as the D614G (98.5 %) in the spike, that has replaced the old variant; the V1167F in the spike (41 %), identified in the P.2 variant that emerged from Brazil during the period of analysis; and I292T (39 %) in the N protein. There were a few alterations in the UTRs, which was expected, however, our data suggest that the emergence of new variants was not influenced by mutations in UTR regions, since it maintained its conformational structure in most analysed sequences. In phylogenetic analysis, the spread of SARS-CoV-2 from the large urban centres to the countryside during these months could be explained by the flexibilization of social isolation measures and also could be associated with possible new waves of infection. These results allow a better understanding of SARS-CoV-2 strains that have circulated in Brazil, and thus, with relevant infomation, provide the potential viral changes that may have affected and/or contributed to the current and future scenario of the COVID-19 pandemic.
-
- Research Articles
-
- Genomic Methodologies
-
-
Identification of isolated or mixed strains from long reads: a challenge met on Streptococcus thermophilus using a MinION sequencer
More LessThis study aimed to provide efficient recognition of bacterial strains on personal computers from MinION (Nanopore) long read data. Thanks to the fall in sequencing costs, the identification of bacteria can now proceed by whole genome sequencing. MinION is a fast, but highly error-prone sequencing device and it is a challenge to successfully identify the strain content of unknown simple or complex microbial samples. It is heavily constrained by memory management and fast access to the read and genome fragments. Our strategy involves three steps: indexing of known genomic sequences for a given or several bacterial species; a request process to assign a read to a strain by matching it to the closest reference genomes; and a final step looking for a minimum set of strains that best explains the observed reads. We have applied our method, called ORI, on 77 strains of Streptococcus thermophilus . We worked on several genomic distances and obtained a detailed classification of the strains, together with a criterion that allows merging of what we termed ‘sibling’ strains, only separated by a few mutations. Overall, isolated strains can be safely recognized from MinION data. For mixtures of several non-sibling strains, results depend on strain abundance.
-
-
-
Delving into defence: identifying the Pseudomonas protegens Pf-5 gene suite involved in defence against secreted products of fungal, oomycete and bacterial rhizosphere competitors
Competitive behaviours of plant growth promoting rhizobacteria (PGPR) are integral to their ability to colonize and persist on plant roots and outcompete phytopathogenic fungi, oomycetes and bacteria. PGPR engage in a range of antagonistic behaviours that have been studied in detail, such as the production and secretion of compounds inhibitory to other microbes. In contrast, their defensive activities that enable them to tolerate exposure to inhibitory compounds produced by their neighbours are less well understood. In this study, the genes involved in the Pseudomonas protegens Pf-5 response to metabolites from eight diverse rhizosphere competitor organisms, Fusarium oxysporum, Rhizoctonia solani, Gaeumannomyces graminis var. tritici, Pythium spinosum, Bacillus subtilis QST713, Pseudomonas sp. Q2-87, Streptomyces griseus and Streptomyces bikiniensis subspecies bikiniensi, were examined. Proximity induced excreted metabolite responses were confirmed for Pf-5 with all partner organisms through HPLC before culturing a dense Pf-5 transposon mutant library adjacent to each of these microbes. This was followed by transposon-directed insertion site sequencing (TraDIS), which identified genes that influence Pf-5 fitness during these competitive interactions. A set of 148 genes was identified that were associated with increased fitness during competition, including cell surface modification, electron transport, nucleotide metabolism, as well as regulatory genes. In addition, 51 genes were identified for which loss of function resulted in fitness gains during competition. These included genes involved in flagella biosynthesis and cell division. Considerable overlap was observed in the set of genes observed to provide a fitness benefit during competition with all eight test organisms, indicating commonalities in the competitive response to phylogenetically diverse micro-organisms and providing new insight into competitive processes likely to take place in the rhizosphere.
-
-
-
Evaluation of whole-genome sequencing-based subtyping methods for the surveillance of Shigella spp. and the confounding effect of mobile genetic elements in long-term outbreaks
More LessMany public health laboratories across the world have implemented whole-genome sequencing (WGS) for the surveillance and outbreak detection of foodborne pathogens. PulseNet-affiliated laboratories have determined that most single-strain foodborne outbreaks are contained within 0–10 multi-locus sequence typing (MLST)-based allele differences and/or core genome single-nucleotide variants (SNVs). In addition to being a food- and travel-associated outbreak pathogen, most Shigella spp. cases occur through continuous person-to-person transmission, predominantly involving men who have sex with men (MSM), leading to long-term and recurrent outbreaks. Continuous transmission patterns coupled to genetic evolution under antibiotic treatment pressure require an assessment of existing WGS-based subtyping methods and interpretation criteria for cluster inclusion/exclusion. An evaluation of 4 WGS-based subtyping methods [SNVPhyl, coreMLST, core genome MLST (cgMLST) and whole-genome MLST (wgMLST)] was performed on 9 foodborne-, travel- and MSM-related retrospective outbreaks from a collection of 91 Shigella flexneri and 232 Shigella sonnei isolates to determine the methods’ epidemiological concordance, discriminatory power, robustness and ability to generate stable interpretation criteria. The discriminatory powers were ranked as follows: coreMLST
Shigella spp. outbreaks respect the standard 0–10 allele/SNV guideline; however, mobile genetic element (MGE)-encoded loci caused inflated genetic variation and discrepant phylogenies for prolonged MSM-related S. sonnei outbreaks via wgMLST. The S. sonnei correlation coefficients of wgMLST were also the lowest at 0.680, 0.703 and 0.712 for SNVPhyl, coreMLST and cgMLST, respectively. Plasmid maintenance, mobilization and conjugation-associated genes were found to be the main source of genetic distance inflation in addition to prophage-related genes. Duplicated alleles arising from the repeated nature of IS elements were also responsible for many false cg/wgMLST differences. The coreMLST approach was shown to be the most robust, followed by SNVPhyl and wgMLST for inter-laboratory comparability. Our results highlight the need for validating species-specific subtyping methods based on microbial genome plasticity and outbreak dynamics in addition to the importance of filtering confounding MGEs for cluster detection.
-
-
-
A comparative study of pan-genome methods for microbial organisms: Acinetobacter baumannii pan-genome reveals structural variation in antimicrobial resistance-carrying plasmids
More LessMicrobial organisms have diverse populations, where using a single linear reference sequence in comparative studies introduces reference-bias in downstream analyses, and leads to a failure to account for variability in the population. Recently, pan-genome graphs have emerged as an alternative to the traditional linear reference with many successful applications and a rapid increase in the number of methods available in the literature. Despite this enthusiasm, there has been no attempt at exploring these graph construction methods in depth, demonstrating their practical use. In this study, we aim to develop a general guide to help researchers who may want to incorporate pan-genomes in their analyses of microbial organisms. We evaluated the state-of-the art pan-genome construction tools to model a collection of 70 Acinetobacter baumannii strains. Our results suggest that all tools produced pan-genome graphs conforming to our expectations based on previous literature, and that their approach to homologue detection is likely to be the most influential in determining the final size and complexity of the pan-genome. The graphs overlapped most in the core pan-genome content while the cloud genes varied significantly among tools. We propose an alternative approach for pan-genome construction by combining two of the tools, Panaroo and Ptolemy, to further exploit them in downstream analyses, and demonstrate the effectiveness of our pipeline for structural variant calling in beta-lactam resistance genes in the same set of A. baumannii isolates, identifying various transposon structures for carbapenem resistance in chromosome, as well as plasmids. We identify a novel plasmid structure in two multidrug-resistant clinical isolates that had previously been studied, and which could be important for their resistance phenotypes.
-
-
-
Evaluation of whole-genome sequence data analysis approaches for short- and long-read sequencing of Mycobacterium tuberculosis
Whole-genome sequencing (WGS) of Mycobacterium tuberculosis (MTB) isolates can be used to get an accurate diagnosis, to guide clinical decision making, to control tuberculosis (TB) and for outbreak investigations. We evaluated the performance of long-read (LR) and/or short-read (SR) sequencing for anti-TB drug-resistance prediction using the TBProfiler and Mykrobe tools, the fraction of genome recovery, assembly accuracies and the robustness of two typing approaches based on core-genome SNP (cgSNP) typing and core-genome multi-locus sequence typing (cgMLST). Most of the discrepancies between phenotypic drug-susceptibility testing (DST) and drug-resistance prediction were observed for the first-line drugs rifampicin, isoniazid, pyrazinamide and ethambutol, mainly with LR sequence data. Resistance prediction to second-line drugs made by both TBProfiler and Mykrobe tools with SR- and LR-sequence data were in complete agreement with phenotypic DST except for one isolate. The SR assemblies were more accurate than the LR assemblies, having significantly (P<0.05) fewer indels and mismatches per 100 kbp. However, the hybrid and LR assemblies had slightly higher genome fractions. For LR assemblies, Canu followed by Racon, and Medaka polishing was the most accurate approach. The cgSNP approach, based on either reads or assemblies, was more robust than the cgMLST approach, especially for LR sequence data. In conclusion, anti-TB drug-resistance prediction, particularly with only LR sequence data, remains challenging, especially for first-line drugs. In addition, SR assemblies appear more accurate than LR ones, and reproducible phylogeny can be achieved using cgSNP approaches.
-
-
-
Evaluation of WGS performance for bacterial pathogen characterization with the Illumina technology optimized for time-critical situations
Whole genome sequencing (WGS) has become the reference standard for bacterial outbreak investigation and pathogen typing, providing a resolution unattainable with conventional molecular methods. Data generated with Illumina sequencers can however only be analysed after the sequencing run has finished, thereby losing valuable time during emergency situations. We evaluated both the effect of decreasing overall run time, and also a protocol to transfer and convert intermediary files generated by Illumina sequencers enabling real-time data analysis for multiple samples part of the same ongoing sequencing run, as soon as the forward reads have been sequenced. To facilitate implementation for laboratories operating under strict quality systems, extensive validation of several bioinformatics assays (16S rRNA species confirmation, gene detection against virulence factor and antimicrobial resistance databases, SNP-based antimicrobial resistance detection, serotype determination, and core genome multilocus sequence typing) for three bacterial pathogens ( Mycobacterium tuberculosis , Neisseria meningitidis , and Shiga-toxin producing Escherichia coli ) was performed by evaluating performance in function of the two most critical sequencing parameters, i.e. read length and coverage. For the majority of evaluated bioinformatics assays, actionable results could be obtained between 14 and 22 h of sequencing, decreasing the overall sequencing-to-results time by more than half. This study aids in reducing the turn-around time of WGS analysis by facilitating a faster response in time-critical scenarios and provides recommendations for time-optimized WGS with respect to required read length and coverage to achieve a minimum level of performance for the considered bioinformatics assay(s), which can also be used to maximize the cost-effectiveness of routine surveillance sequencing when response time is not essential.
-
- Functional Genomics and Microbe–Niche Interactions
-
-
Whole set of constitutive promoters for RpoN sigma factor and the regulatory role of its enhancer protein NtrC in Escherichia coli K-12
More LessThe promoter selectivity of Escherichia coli RNA polymerase (RNAP) is determined by its promoter-recognition sigma subunit. The model prokaryote E. coli K-12 contains seven species of the sigma subunit, each recognizing a specific set of promoters. Using genomic SELEX (gSELEX) screening in vitro, we identified the whole set of ‘constitutive’ promoters recognized by the reconstituted RNAP holoenzyme alone, containing RpoD (σ70), RpoS (σ38), RpoH (σ32), RpoF (σ28) or RpoE (σ24), in the absence of other supporting regulatory factors. In contrast, RpoN sigma (σ54), involved in expression of nitrogen-related genes and also other cellular functions, requires an enhancer (or activator) protein, such as NtrC, for transcription initiation. In this study, a series of gSELEX screenings were performed to search for promoters recognized by the RpoN RNAP holoenzyme in the presence and absence of the major nitrogen response enhancer NtrC, the best-characterized enhancer. Based on the RpoN holoenzyme-binding sites, a total of 44 to 61 putative promoters were identified, which were recognized by the RpoN holoenzyme alone. In the presence of the enhancer NtrC, the recognition target increased to 61–81 promoters. Consensus sequences of promoters recognized by RpoN holoenzyme in the absence and presence of NtrC were determined. The promoter activity of a set of NtrC-dependent and -independent RpoN promoters was verified in vivo under nitrogen starvation, in the presence and absence of RpoN and/or NtrC. The promoter activity of some RpoN-recognized promoters increased in the absence of RpoN or NtrC, supporting the concept that the promoter-bound NtrC-enhanced RpoN holoenzyme functions as a repressor against RpoD holoenzyme. Based on our findings, we propose a model in which the RpoN holoenzyme fulfils the dual role of repressor and transcriptase for the same set of genes. We also propose that the promoter recognized by RpoN holoenzyme in the absence of enhancers is the ‘repressive’ promoter. The presence of high-level RpoN sigma in growing E. coli K-12 in rich medium may be related to the repression role of a set of genes needed for the utilization of ammonia as a nitrogen source in poor media. The list of newly identified regulatory targets of RpoN provides insight into E. coli survival under nitrogen-depleted conditions in nature.
-
-
-
Genome editing reveals that pSCL4 is required for chromosome linearity in Streptomyces clavuligerus
Streptomyces clavuligerus is an industrially important actinomycete whose genetic manipulation is limited by low transformation and conjugation efficiencies, low levels of recombination of introduced DNA, and difficulty in obtaining consistent sporulation. We describe the construction and application of versatile vectors for Cas9-mediated genome editing of this strain. To design spacer sequences with confidence, we derived a highly accurate genome assembly for an isolate of the type strain (ATCC 27064). This yielded a chromosome assembly (6.75 Mb) plus assemblies for pSCL4 (1795 kb) and pSCL2 (149 kb). The strain also carries pSCL1 (12 kb), but its small size resulted in only partial sequence coverage. The previously described pSCL3 (444 kb) is not present in this isolate. Using our Cas9 vectors, we cured pSCL4 with high efficiency by targeting the plasmid’s parB gene. Five of the resulting pSCL4-cured isolates were characterized and all showed impaired sporulation. Shotgun genome sequencing of each of these derivatives revealed large deletions at the ends of the chromosomes in all of them, and for two clones sufficient sequence data was obtained to show that the chromosome had circularized. Taken together, these data indicate that pSCL4 is essential for the structural stability of the linear chromosome.
-
-
-
Genome evolution drives transcriptomic and phenotypic adaptation in Pseudomonas aeruginosa during 20 years of infection
The opportunistic pathogen Pseudomonas aeruginosa chronically infects the lungs of patients with cystic fibrosis (CF). During infection the bacteria evolve and adapt to the lung environment. Here we use genomic, transcriptomic and phenotypic approaches to compare multiple isolates of P. aeruginosa collected more than 20 years apart during a chronic infection in a CF patient. Complete genome sequencing of the isolates, using short- and long-read technologies, showed that a genetic bottleneck occurred during infection and was followed by diversification of the bacteria. A 125 kb deletion, an 0.9 Mb inversion and hundreds of smaller mutations occurred during evolution of the bacteria in the lung, with an average rate of 17 mutations per year. Many of the mutated genes are associated with infection or antibiotic resistance. RNA sequencing was used to compare the transcriptomes of an earlier and a later isolate. Substantial reprogramming of the transcriptional network had occurred, affecting multiple genes that contribute to continuing infection. Changes included greatly reduced expression of flagellar machinery and increased expression of genes for nutrient acquisition and biofilm formation, as well as altered expression of a large number of genes of unknown function. Phenotypic studies showed that most later isolates had increased cell adherence and antibiotic resistance, reduced motility, and reduced production of pyoverdine (an iron-scavenging siderophore), consistent with genomic and transcriptomic data. The approach of integrating genomic, transcriptomic and phenotypic analyses reveals, and helps to explain, the plethora of changes that P. aeruginosa undergoes to enable it to adapt to the environment of the CF lung during a chronic infection.
-
-
-
Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification
Command-line annotation software tools have continuously gained popularity compared to centralized online services due to the worldwide increase of sequenced bacterial genomes. However, results of existing command-line software pipelines heavily depend on taxon-specific databases or sufficiently well annotated reference genomes. Here, we introduce Bakta, a new command-line software tool for the robust, taxon-independent, thorough and, nonetheless, fast annotation of bacterial genomes. Bakta conducts a comprehensive annotation workflow including the detection of small proteins taking into account replicon metadata. The annotation of coding sequences is accelerated via an alignment-free sequence identification approach that in addition facilitates the precise assignment of public database cross-references. Annotation results are exported in GFF3 and International Nucleotide Sequence Database Collaboration (INSDC)-compliant flat files, as well as comprehensive JSON files, facilitating automated downstream analysis. We compared Bakta to other rapid contemporary command-line annotation software tools in both targeted and taxonomically broad benchmarks including isolates and metagenomic-assembled genomes. We demonstrated that Bakta outperforms other tools in terms of functional annotations, the assignment of functional categories and database cross-references, whilst providing comparable wall-clock runtimes. Bakta is implemented in Python 3 and runs on MacOS and Linux systems. It is freely available under a GPLv3 license at https://github.com/oschwengers/bakta. An accompanying web version is available at https://bakta.computational.bio.
-
- Microbial Communities
-
-
Massively parallel transposon mutagenesis identifies temporally essential genes for biofilm formation in Escherichia coli
More LessBiofilms complete a life cycle where cells aggregate, grow and produce a structured community before dispersing to colonize new environments. Progression through this life cycle requires temporally controlled gene expression to maximize fitness at each stage. Previous studies have largely focused on identifying genes essential for the formation of a mature biofilm; here, we present an insight into the genes involved at different stages of biofilm formation. We used TraDIS-Xpress, a massively parallel transposon mutagenesis approach using transposon-located promoters to assay the impact of disruption or altered expression of all genes in the genome on biofilm formation. We identified 48 genes that affected the fitness of cells growing in a biofilm, including genes with known roles and those not previously implicated in biofilm formation. Regulation of type 1 fimbriae and motility were important at all time points, adhesion and motility were important for the early biofilm, whereas matrix production and purine biosynthesis were only important as the biofilm matured. We found strong temporal contributions to biofilm fitness for some genes, including some where expression changed between being beneficial or detrimental depending on the stage at which they are expressed, including dksA and dsbA. Novel genes implicated in biofilm formation included zapE and truA involved in cell division, maoP in chromosome organization, and yigZ and ykgJ of unknown function. This work provides new insights into the requirements for successful biofilm formation through the biofilm life cycle and demonstrates the importance of understanding expression and fitness through time.
-
- Pathogens and Epidemiology
-
-
Genetic diversity and transmission patterns of Burkholderia pseudomallei on Hainan island, China, revealed by a population genomics analysis
Burkholderia pseudomallei is a Gram-negative soil-dwelling bacillus that causes melioidosis, a frequently fatal infectious disease, in tropical and subtropical regions. Previous studies have identified the overall genetic and evolutionary characteristics of B. pseudomallei on a global scale, including its origin and transmission routes. However, beyond its known hyperendemicity foci in northern Australia and Southeast Asia, the distribution and genetic characteristics of B. pseudomallei in most tropical regions remain poorly understood, including in southern China. Here, we sequenced the genomes of 122 B. pseudomallei strains collected from Hainan, an island in southern China, in 2002–2018, to investigate the population structure, relationships with global strains, local epidemiology, and virulence and antimicrobial-resistance factors. A phylogenetic analysis and hierarchical clustering divided the Hainan strains into nine phylogenic groups (PGs), 80 % of which were concentrated within five major groups (group 1: corresponding to minor sequence types [STs], 12.3 %; group 3: ST46 and ST50, 31.1 %; group 9: ST58, 13.1 %; group 11: ST55, 8.2 %; group 15: mainly ST658, 15.6%). A phylogenetic analysis that included global strains suggested that B. pseudomallei in Hainan originated from Southeast Asian countries, transmitted in multiple historical importation events. We also identified several mutual transmission events between Hainan and Southeast Asian countries in recent years, including three importation events from Thailand and Singapore to Hainan and three exportation events from Hainan to Singapore, Malaysia, and Taiwan island. A statistical analysis of the temporal distribution showed that the Hainan strains of groups 3, 9, and 15 have dominated the disease epidemic locally in the last 5 years. The spatial distribution of the Hainan strains demonstrated that some PGs are distributed in different cities on Hainan island, and by combining phylogenic and geographic distribution information, we detected 21 between-city transmission events, indicating its frequent local transmission. The detection of virulence factor genes showed that 56 % of the Hainan strains in group 1 encode a B. pseudomallei -specific adherence factor, boaB, confirming the specific pathogenic characteristics of the Hainan strains in group 1. An analysis of the antimicrobial-resistance potential of B. pseudomallei showed that various kinds of alterations were identified in clinically relevant antibiotic resistance factors, such as AmrR, PenA and PBP3, etc. Our results clarify the population structure, local epidemiology, and pathogenic characteristics of B. pseudomallei in Hainan, providing further insight into its regional and global transmission networks and improving our knowledge of its global phylogeography.
-
-
-
Biomolecule sulphation and novel methylations related to Guillain-Barré syndrome-associated Campylobacter jejuni serotype HS:19
Campylobacter jejuni strains that produce sialylated lipooligosaccharides (LOS) can cause the immune-mediated disease Guillain-Barré syndrome (GBS). The risk of GBS after infection with C. jejuni Penner serotype HS:19 is estimated to be at least six times higher than the average risk. Aside from LOS biosynthesis genes, genomic characteristics that promote an increased risk for GBS following C. jejuni HS:19 infection, remain uncharacterized. We hypothesized that strains with the HS:19 serotype have unique genomic features that explain the increased risk for GBS. We performed genome sequencing, alignments, single nucleotide polymorphisms' analysis and methylome characterization on a subset, and pan-genome analysis on a large number of genomes to compare HS:19 with non-HS:19 C. jejuni genome sequences. Comparison of 36 C. jejuni HS:19 with 874 C. jejuni non-HS:19 genome sequences led to the identification of three single genes and ten clusters containing contiguous genes that were significantly associated with C. jejuni HS:19. One gene cluster of seven genes, localized downstream of the capsular biosynthesis locus, was related to sulphation of biomolecules. This cluster also encoded the campylobacter sialyl transferase Cst-I. Interestingly, sulphated bacterial biomolecules such as polysaccharides can promote immune responses and, therefore, (in the presence of sialic acid) may play a role in the development of GBS. Additional gene clusters included those involved in persistence-mediated pathogenicity and gene clusters involved in restriction-modification systems. Furthermore, characterization of methylomes of two HS:19 strains exhibited novel methylation patterns (5′-CATG-3 and 5′-m6AGTNNNNNNRTTG-3) that could differentially effect gene-expression patterns of C. jejuni HS:19 strains. Our study provides novel insight into specific genetic features and possible virulence factors of C. jejuni associated with the HS:19 serotype that may explain the increased risk of GBS.
-
-
-
Invasive atypical non-typhoidal Salmonella serovars in The Gambia
Invasive non-typhoidal Salmonella (iNTS) disease continues to be a significant public health problem in sub-Saharan Africa. Common clinical misdiagnosis, antimicrobial resistance, high case fatality and lack of a vaccine make iNTS a priority for global health research. Using whole genome sequence analysis of 164 invasive Salmonella isolates obtained through population-based surveillance between 2008 and 2016, we conducted genomic analysis of the serovars causing invasive Salmonella diseases in rural Gambia. The incidence of iNTS varied over time. The proportion of atypical serovars causing disease increased over time from 40 to 65 % compared to the typical serovars Enteritidis and Typhimurium that decreased from 30 to 12 %. Overall iNTS case fatality was 10%, but case fatality associated with atypical iNTS alone was 10 %. Genetic virulence factors were identified in 14/70 (20 %) typical serovars and 45/68 (66 %) of the atypical serovars and were associated with: invasion, proliferation and/or translocation (Clade A); and host colonization and immune modulation (Clade G). Among Enteritidis isolates, 33/40 were resistant to four or more of the antimicrobials tested, except ciprofloxacin, to which all isolates were susceptible. Resistance was low in Typhimurium isolates, but all 16 isolates were resistant to gentamicin. The increase in incidence and proportion of iNTS disease caused by atypical serovars is concerning. The increased proportion of atypical serovars and the high associated case fatality may be related to acquisition of specific genetic virulence factors. These factors may provide a selective advantage to the atypical serovars. Investigations should be conducted elsewhere in Africa to identify potential changes in the distribution of iNTS serovars and the extent of these virulence elements.
-
-
-
Genome-wide association study of gastric cancer- and duodenal ulcer-derived Helicobacter pylori strains reveals discriminatory genetic variations and novel oncoprotein candidates
Vo Phuoc Tuan, Koji Yahara, Ho Dang Quy Dung, Tran Thanh Binh, Pham Huu Tung, Tran Dinh Tri, Ngo Phuong Minh Thuan, Vu Van Khien, Tran Thi Huyen Trang, Bui Hoang Phuc, Evariste Tshibangu-Kabamba, Takashi Matsumoto, Junko Akada, Rumiko Suzuki, Tadayoshi Okimoto, Masaaki Kodama, Kazunari Murakami, Hirokazu Yano, Masaki Fukuyo, Noriko Takahashi, Mototsugu Kato, Shin Nishiumi, Takashi Azuma, Yoshitoshi Ogura, Tetsuya Hayashi, Atsushi Toyoda, Ichizo Kobayashi and Yoshio YamaokaGenome-wide association studies (GWASs) can reveal genetic variations associated with a phenotype in the absence of any hypothesis of candidate genes. The problem of false-positive sites linked with the responsible site might be bypassed in bacteria with a high homologous recombination rate, such as Helicobacter pylori , which causes gastric cancer. We conducted a small-sample GWAS (125 gastric cancer cases and 115 controls) followed by prediction of gastric cancer and control (duodenal ulcer) H. pylori strains. We identified 11 single nucleotide polymorphisms (eight amino acid changes) and three DNA motifs that, combined, allowed effective disease discrimination. They were often informative of the underlying molecular mechanisms, such as electric charge alteration at the ligand-binding pocket, alteration in subunit interaction, and mode-switching of DNA methylation. We also identified three novel virulence factors/oncoprotein candidates. These results provide both defined targets for further informatic and experimental analyses to gain insights into gastric cancer pathogenesis and a basis for identifying a set of biomarkers for distinguishing these H. pylori -related diseases.
-
-
-
Genome structural variation in Escherichia coli O157:H7
The human zoonotic pathogen Escherichia coli O157:H7 is defined by its extensive prophage repertoire including those that encode Shiga toxin, the factor responsible for inducing life-threatening pathology in humans. As well as introducing genes that can contribute to the virulence of a strain, prophage can enable the generation of large-chromosomal rearrangements (LCRs) by homologous recombination. This work examines the types and frequencies of LCRs across the major lineages of the O157:H7 serotype. We demonstrate that LCRs are a major source of genomic variation across all lineages of E. coli O157:H7 and by using both optical mapping and Oxford Nanopore long-read sequencing prove that LCRs are generated in laboratory cultures started from a single colony and that these variants can be recovered from colonized cattle. LCRs are biased towards the terminus region of the genome and are bounded by specific prophages that share large regions of sequence homology associated with the recombinational activity. RNA transcriptional profiling and phenotyping of specific structural variants indicated that important virulence phenotypes such as Shiga-toxin production, type-3 secretion and motility can be affected by LCRs. In summary, E. coli O157:H7 has acquired multiple prophage regions over time that act to continually produce structural variants of the genome. These findings raise important questions about the significance of this prophage-mediated genome contingency to enhance adaptability between environments.
-
-
-
Comprehensive and accurate genetic variant identification from contaminated and low-coverage Mycobacterium tuberculosis whole genome sequencing data
More LessImproved understanding of the genomic variants that allow Mycobacterium tuberculosis (Mtb) to acquire drug resistance, or tolerance, and increase its virulence are important factors in controlling the current tuberculosis epidemic. Current approaches to Mtb sequencing, however, cannot reveal Mtb’s full genomic diversity due to the strict requirements of low contamination levels, high Mtb sequence coverage and elimination of complex regions. We have developed the XBS (compleX Bacterial Samples) bioinformatics pipeline, which implements joint calling and machine-learning-based variant filtering tools to specifically improve variant detection in the important Mtb samples that do not meet these criteria, such as those from unbiased sputum samples. Using novel simulated datasets, which permit exact accuracy verification, XBS was compared to the UVP and MTBseq pipelines. Accuracy statistics showed that all three pipelines performed equally well for sequence data that resemble those obtained from culture isolates of high depth of coverage and low-level contamination. In the complex genomic regions, however, XBS accurately identified 9.0 % more SNPs and 8.1 % more single nucleotide insertions and deletions than the WHO-endorsed unified analysis variant pipeline. XBS also had superior accuracy for sequence data that resemble those obtained directly from sputum samples, where depth of coverage is typically very low and contamination levels are high. XBS was the only pipeline not affected by low depth of coverage (5–10×), type of contamination and excessive contamination levels (>50 %). Simulation results were confirmed using whole genome sequencing (WGS) data from clinical samples, confirming the superior performance of XBS with a higher sensitivity (98.8%) when analysing culture isolates and identification of 13.9 % more variable sites in WGS data from sputum samples as compared to MTBseq, without evidence for false positive variants when rRNA regions were excluded. The XBS pipeline facilitates sequencing of less-than-perfect Mtb samples. These advances will benefit future clinical applications of Mtb sequencing, especially WGS directly from clinical specimens, thereby avoiding in vitro biases and making many more samples available for drug resistance and other genomic analyses. The additional genetic resolution and increased sample success rate will improve genome-wide association studies and sequence-based transmission studies.
-
-
-
SARS-CoV-2 variants of concern dominate in Lahore, Pakistan in April 2021
Muhammad Bilal Sarwar, Muhammad Yasir, Nabil-Fareed Alikhan, Nadeem Afzal, Leonardo de Oliveira Martins, Thanh Le Viet, Alexander J. Trotter, Sophie J. Prosolek, Gemma L. Kay, Ebenezer Foster-Nyarko, Steven Rudder, David J. Baker, Sidra Tul Muntaha, Muhammad Roman, Mark A. Webber, Almina Shafiq, Bilquis Shabbir, Javed Akram, Andrew J. Page and Shah JahanThe SARS-CoV-2 pandemic continues to expand globally, with case numbers rising in many areas of the world, including the Indian sub-continent. Pakistan has one of the world’s largest populations, of over 200 million people and is experiencing a severe third wave of infections caused by SARS-CoV-2 that began in March 2021. In Pakistan, during the third wave until now only 12 SARS-CoV-2 genomes have been collected and among these nine are from Islamabad. This highlights the need for more genome sequencing to allow surveillance of variants in circulation. In fact, more genomes are available among travellers with a travel history from Pakistan, than from within the country itself. We thus aimed to provide a snapshot assessment of circulating lineages in Lahore and surrounding areas with a combined population of 11.1 million. Within a week of April 2021, 102 samples were sequenced. The samples were randomly collected from two hospitals with a diagnostic PCR cutoff value of less than 25 cycles. Analysis of the lineages shows that the Alpha variant of concern (first identified in the UK) dominates, accounting for 97.9 % (97/99) of cases, with the Beta variant of concern (first identified in South Africa) accounting for 2.0 % (2/99) of cases. No other lineages were observed. In depth analysis of the Alpha lineages indicated multiple separate introductions and subsequent establishment within the region. Eight samples were identical to genomes observed in Europe (seven UK, one Switzerland), indicating recent transmission. Genomes of other samples show evidence that these have evolved, indicating sustained transmission over a period of time either within Pakistan or other countries with low-density genome sequencing. Vaccines remain effective against Alpha, however, the low level of Beta against which some vaccines are less effective demonstrates the requirement for continued prospective genomic surveillance.
-
-
-
A species-wide genetic atlas of antimicrobial resistance in Clostridioides difficile
More LessAntimicrobial resistance (AMR) plays an important role in the pathogenesis and spread of Clostridioides difficile infection (CDI), the leading healthcare-related gastrointestinal infection in the world. An association between AMR and CDI outbreaks is well documented, however, data is limited to a few ‘epidemic’ strains in specific geographical regions. Here, through detailed analysis of 10 330 publicly-available C. difficile genomes from strains isolated worldwide (spanning 270 multilocus sequence types (STs) across all known evolutionary clades), this study provides the first species-wide snapshot of AMR genomic epidemiology in C. difficile . Of the 10 330 C . difficile genomes, 4532 (43.9 %) in 89 STs across clades 1–5 carried at least one genotypic AMR determinant, with 901 genomes (8.7 %) carrying AMR determinants for three or more antimicrobial classes (multidrug-resistant, MDR). No AMR genotype was identified in any strains belonging to the cryptic clades. C. difficile from Australia/New Zealand had the lowest AMR prevalence compared to strains from Asia, Europe and North America (P<0.0001). Based on the phylogenetic clade, AMR prevalence was higher in clades 2 (84.3 %), 4 (81.5 %) and 5 (64.8 %) compared to other clades (collectively 26.9 %) (P<0.0001). MDR prevalence was highest in clade 4 (61.6 %) which was over three times higher than in clade 2, the clade with the second-highest MDR prevalence (18.3 %). There was a strong association between specific AMR determinants and three major epidemic C. difficile STs: ST1 (clade 2) with fluoroquinolone resistance (mainly T82I substitution in GyrA) (P<0.0001), ST11 (clade 5) with tetracycline resistance (various tet-family genes) (P<0.0001) and ST37 (clade 4) with macrolide-lincosamide-streptogramin B (MLSB) resistance (mainly ermB) (P<0.0001) and MDR (P<0.0001). A novel and previously overlooked tetM-positive transposon designated Tn6944 was identified, predominantly among clade 2 strains. This study provides a comprehensive review of AMR in the global C. difficile population which may aid in the early detection of drug-resistant C. difficile strains, and prevention of their dissemination worldwide.
-