- Volume 9, Issue 6, 2023
Volume 9, Issue 6, 2023
- Editorials
-
- Personal Views
-
- Pathogens and Epidemiology
-
-
Pathogen genomics in public health laboratories: successes, challenges, and lessons learned from California’s SARS-CoV-2 Whole-Genome Sequencing Initiative, California COVIDNet
The capacity for pathogen genomics in public health expanded rapidly during the coronavirus disease 2019 (COVID-19) pandemic, but many public health laboratories did not have the infrastructure in place to handle the vast amount of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) sequence data generated. The California Department of Public Health, in partnership with Theiagen Genomics, was an early adopter of cloud-based resources for bioinformatics and genomic epidemiology, resulting in the creation of a SARS-CoV-2 genomic surveillance system that combined the efforts of more than 40 sequencing laboratories across government, academia and industry to form California COVIDNet, California’s SARS-CoV-2 Whole-Genome Sequencing Initiative. Open-source bioinformatics workflows, ongoing training sessions for the public health workforce, and automated data transfer to visualization tools all contributed to the success of California COVIDNet. While challenges remain for public health genomic surveillance worldwide, California COVIDNet serves as a framework for a scaled and successful bioinformatics infrastructure that has expanded beyond SARS-CoV-2 to other pathogens of public health importance,
-
- Reviews
-
- Functional Genomics and Microbe–Niche Interactions
-
-
Bacterium of one thousand and one variants: genetic diversity of Neisseria gonorrhoeae pathogenicity.
More LessThe bacterium Neisseria gonorrhoeae causes the sexually transmitted infection gonorrhoea. Although diverse clinical manifestations are associated with gonorrhoea, ranging from asymptomatic through to localized and disseminated infection, very little is known about the bacterial determinants implicated in causing such different clinical symptoms. In particular, virulence factors, although defined and investigated in particular strains, often lack comprehensive analysis of their genetic diversity and how this relates to particular disease states. This review examines the clinical manifestations of gonorrhoea and discusses them in relation to disease severity and association with expression of particular virulence factors including PorB, lipooligosaccharide (LOS) and Opa, both in terms of their mechanisms of action and inter- and intra-strain variation. Particular attention is paid to phase variation as a key mechanism of genetic variation in the gonococcus and the impact of this during infection. We describe how whole-genome-sequence-based approaches that focus on virulence factors can be employed for vaccine development and discuss whether whole-genome-sequence data can be used to predict the severity of gonococcal infection.
-
- Research Articles
-
- Genomic Methodologies
-
-
Evidence of horizontal gene transfer within porB in 19 018 whole-genome Neisseria spp. isolates: a global phylogenetic analysis
The PorB porins are the major pore-forming proteins in the genus Neisseria . The trimeric PorB porins consist of 16 highly conserved transmembrane domains that form an amphipathic β-sheet connected by short periplasmic turns and eight extracellular hydrophilic loops. These loops are immunogenic and also play an important role in mediating antimicrobial influx. This study sought to (i) characterize the variations in Neisserial loop 3(355–438 bp) associated with intermediate resistance to penicillin/tetracycline and (ii) evaluate if there was evidence of horizontal gene transfer in any of the loops. We collated an integrated database consisting of 19 018 Neisseria spp. genomes – 17 882 Neisseria gonorrhoeae , 114 Neisseria meningitidis and 1022 commensal Neisseria spp. To identify the porB alleles, a gene-by-gene approach (chewBBACA) was employed. To evaluate the presence of recombination events, the Recombination Detection Programme (RDP4) was used. In total, 3885 porB alleles were detected. Paralogues were identified in 17 Neisseria isolates. Putative recombination was identified in loop regions. Intraspecies recombination among N. gonorrhoeae isolates and interspecies recombination between N. meningitidis and commensal Neisseria spp., and N. gonorrhoeae and N. lactamica were identified. Here, we present a large-scale study of 19 018 Neisseria isolates to describe recombination and variation in the porB gene. Importantly, we found putative recombination in loop regions between the pathogenic and non-pathogenic Neisseria spp. These findings suggest the need for pheno- and genotypic surveillance of antimicrobial susceptibility in commensal Neisseria spp. to prevent the emergence of AMR in the pathogenic Neisseria . This article contains data hosted by Microreact.
-
-
-
Transconjugant range of PromA plasmids in microbial communities is predicted by sequence similarity with the bacterial host chromosome
More LessNucleotide sequence similarity, including k-mer plasmid composition, has been used for prediction of plasmid evolutionary host range, representing the hosts in which a plasmid has replicated at some point during its evolutionary history. However, the relationships between the bacterial taxa of experimentally identified transconjugants and the predicted evolutionary host ranges are poorly understood. Here, four different PromA group plasmids showing different k-mer compositions were used as model plasmids. Filter mating assays were performed with a donor harbouring plasmids and recipients of bacterial communities extracted from environmental samples. A broad range of transconjugants was obtained with different bacterial taxa. A calculation of the dissimilarities in k-mer compositions as Mahalanobis distance between the plasmid and its sequenced transconjugant chromosomes revealed that each plasmid and transconjugant were significantly more similar than the plasmid and other non-transconjugant chromosomes. These results indicate that plasmids with different k-mer compositions clearly have different host ranges to which the plasmid will be transferred and replicated. The similarity of the nucleotide compositions could be used for predicting not only the plasmid evolutionary host range but also future host ranges.
-
- Functional Genomics and Microbe–Niche Interactions
-
-
A panoramic view of the genomic landscape of the genus Streptomyces
We delineate the evolutionary plasticity of the ecologically and biotechnologically important genus Streptomyces , by analysing the genomes of 213 species. Streptomycetes genomes demonstrate high levels of internal homology, whereas the genome of their last common ancestor was already complex. Importantly, we identify the species-specific fingerprint proteins that characterize each species. Even among closely related species, we observed high interspecies variability of chromosomal protein-coding genes, species-level core genes, accessory genes and fingerprints. Notably, secondary metabolite biosynthetic gene clusters (smBGCs), carbohydrate-active enzymes (CAZymes) and protein-coding genes bearing the rare TTA codon demonstrate high intraspecies and interspecies variability, which emphasizes the need for strain-specific genomic mining. Highly conserved genes, such as those specifying genus-level core proteins, tend to occur in the central region of the chromosome, whereas those encoding proteins with evolutionarily volatile species-level fingerprints, smBGCs, CAZymes and TTA-codon-bearing genes are often found towards the ends of the linear chromosome. Thus, the chromosomal arms emerge as the part of the genome that is mainly responsible for rapid adaptation at the species and strain level. Finally, we observed a moderate, but statistically significant, correlation between the total number of CAZymes and three categories of smBGCs (siderophores, e-Polylysin and type III lanthipeptides) that are related to competition among bacteria.
-
-
-
Genomic analysis of Kazachstania aerobia and Kazachstania servazzii reveals duplication of genes related to acetate ester production
More LessKazachstania aerobia and Kazachstania servazzii can affect wine aroma by increasing acetate ester concentrations, most remarkably phenylethyl acetate and isoamyl acetate. The genetic basis of this is unknown, there being little to no sequence data available on the genome architecture. We report for the first time the near-complete genome sequence of the two species using long-read (PacBio) sequencing (K. aerobia 20 contigs, one scaffold; and K. servazzii 22 contigs, one scaffold). The annotated genomes of K. aerobia (12.5 Mb) and K. servazzii (12.3 Mb) were compared to Saccharomyces cerevisiae genomes (laboratory strain S288C and wine strain EC1118). Whilst a comparison of the two Kazachstania spp. genomes revealed few differences between them, divergence was evident in relation to the genes involved in ester biosynthesis, for which gene duplications or absences were apparent. The annotations of these genomes are valuable resources for future research into the evolutionary biology of Kazachstania and other yeast species (comparative genomics) as well as understanding the metabolic processes associated with alcoholic fermentation and the production of secondary ‘aromatic’ metabolites (transcriptomics, proteomics and metabolomics).
-
-
-
A comparative genome analysis of the Bacillota (Firmicutes) class Dehalobacteriia
Dehalobacterium formicoaceticum is recognized for its ability to anaerobically ferment dichloromethane (DCM), and a catabolic model has recently been proposed. D. formicoaceticum is currently the only axenic representative of its class, the Dehalobacteriia, according to the Genome Taxonomy Database. However, substantial additional diversity has been revealed in this lineage through culture-independent exploration of anoxic habitats. Here we performed a comparative analysis of 10 members of the Dehalobacteriia, representing three orders, and infer that anaerobic DCM degradation appears to be a recently acquired trait only present in some members of the order Dehalobacteriales. Inferred traits common to the class include the use of amino acids as carbon and energy sources for growth, energy generation via a remarkable range of putative electron-bifurcating protein complexes and the presence of S-layers. The ability of D. formicoaceticum to grow on serine without DCM was experimentally confirmed and a high abundance of the electron-bifurcating protein complexes and S-layer proteins was noted when this organism was grown on DCM. We suggest that members of the Dehalobacteriia are low-abundance fermentative scavengers in anoxic habitats.
-
- Microbial Communities
-
-
Noise reduction strategies in metagenomic chromosome confirmation capture to link antibiotic resistance genes to microbial hosts
The gut microbiota is a reservoir for antimicrobial resistance genes (ARGs). With current sequencing methods, it is difficult to assign ARGs to their microbial hosts, particularly if these ARGs are located on plasmids. Metagenomic chromosome conformation capture approaches (meta3C and Hi-C) have recently been developed to link bacterial genes to phylogenetic markers, thus potentially allowing the assignment of ARGs to their hosts on a microbiome-wide scale. Here, we generated a meta3C dataset of a human stool sample and used previously published meta3C and Hi-C datasets to investigate bacterial hosts of ARGs in the human gut microbiome. Sequence reads mapping to repetitive elements were found to cause problematic noise in, and may importantly skew interpretation of, meta3C and Hi-C data. We provide a strategy to improve the signal-to-noise ratio by discarding reads that map to insertion sequence elements and to the end of contigs. We also show the importance of using spike-in controls to quantify whether the cross-linking step in meta3C and Hi-C protocols has been successful. After filtering to remove artefactual links, 87 ARGs were assigned to their bacterial hosts across all datasets, including 27 ARGs in the meta3C dataset we generated. We show that commensal gut bacteria are an important reservoir for ARGs, with genes coding for aminoglycoside and tetracycline resistance being widespread in anaerobic commensals of the human gut.
-
- Pathogens and Epidemiology
-
-
Environmental dynamics of Campylobacter jejuni genotypes circulating in Luxembourg: what is the role of wild birds?
Campylobacter jejuni is the leading cause of bacterial gastroenteritis worldwide, but, unlike other foodborne pathogens, is not commonly reported as causing outbreaks. The population structure of the species is characterized by a high degree of genetic diversity, but the presence of stable clonally derived genotypes persisting in space and time, and potentially leading to diffuse outbreaks, has recently been identified. The spread of these recurring genotypes could be enhanced by wild birds, suspected to act as vectors for a wide range of microorganisms that can be transmissible to other animals or humans. This study assessed the genetic diversity of C. jejuni carriage in wild birds and surface waters to explore a potential link between these environments and the persistence over years of recurring lineages infecting humans in Luxembourg. These lineages corresponded to over 40 % of clinical isolates over a 4 year period from 2018 to 2021. While mainly exotic genotypes were recovered from environmental samples, 4 % of C. jejuni from wild birds corresponded to human recurring genotypes. Among them, a human clinical endemic lineage, occurring for over a decade in Luxembourg, was detected in one bird species, suggesting a possible contribution to the persistence of this clone and its multi-host feature. Whereas 27 % of wild birds were carriers of C. jejuni, confirming their role as spreader or reservoir, only three out of 59 genotypes overlapped with recurring human strains. While direct transmission of C. jejuni infection through wild birds remains questionable, they may play a key role in the environmental spreading of stable clones to livestock, and this issue merits further investigation.
-
-
-
Genomic analysis of the initial dissemination of carbapenem-resistant Klebsiella pneumoniae clones in a tertiary hospital
Carbapenem-resistant Klebsiella pneumoniae is a major cause of hospital-acquired infections and the fastest-growing pathogen in Europe. Carbapenem resistance was detected at the Consorcio Hospital General Universitario de Valencia (CHGUV) in early 2015, and there has been a significant increase in carbapenem-resistant isolates since then. In this study, we collected carbapenem-resistant isolates from this hospital during the period of increase (from 2015 to 2019) and studied how K. pneumoniae carbapenem-resistant isolates emerged and spread in the hospital. A total of 225 isolates were subjected to whole-genome sequencing with Illumina NextSeq. We characterized the isolates by identifying lineages and antimicrobial resistance genes and plasmids, especially those related to reduced carbapenem susceptibility. Our findings show that the initial carbapenem resistance emergence and dissemination at the CHGUV occurred during a short period of 1 year. Furthermore, it was complex, involving six different lineages of types ST307, ST11, ST101 and ST437, different resistance-determinant factors, including OXA-48, NDM-1, NDM-23 and DHA-1, and different plasmids.
-
-
-
Genomic analysis of extended-spectrum beta-lactamase (ESBL) producing Escherichia coli colonising adults in Blantyre, Malawi reveals previously undescribed diversity
Escherichia coli is one of the most prevalent Gram-negative species associated with drug resistant infections. Strains that produce extended-spectrum beta-lactamases (ESBLs) or carbapenemases are both particularly problematic and disproportionately impact resource limited healthcare settings where last-line antimicrobials may not be available. A large number of E. coli genomes are now available and have allowed insights into pathogenesis and epidemiology of ESBL E. coli but genomes from sub-Saharan Africa (sSA) are significantly underrepresented. To reduce this gap, we investigated ESBL-producing E. coli colonising adults in Blantyre, Malawi to assess bacterial diversity and AMR determinants and to place these isolates in the context of the wider population structure. We performed short-read whole-genome sequencing of 473 colonising ESBL E. coli isolated from human stool and contextualised the genomes with a previously curated multi-country collection of 10 146 E. coli genomes and sequence type (ST)-specific collections for our three most commonly identified STs. These were the globally successful ST131, ST410 and ST167, and the dominant ESBL genes were bla CTX-M, mirroring global trends. However, 37 % of Malawian isolates did not cluster with any isolates in the curated multicountry collection and phylogenies were consistent with locally spreading monophyletic clades, including within the globally distributed, carbapenemase-associated B4/H24RxC ST410 lineage. A single ST2083 isolate in this collection harboured a carbapenemase gene. Long read sequencing demonstrated the presence of a globally distributed ST410-associated carbapenemase carrying plasmid in this isolate, which was absent from the ST410 strains in our collection. We conclude there is a risk that carbapenem resistance in E. coli could proliferate rapidly in Malawi under increasing selection pressure, and that both ongoing antimicrobial stewardship and genomic surveillance are critical as local carbapenem use increases.
-
-
-
A novel variant of the Listeria monocytogenes type VII secretion system EssC component is associated with an Rhs toxin
More LessThe type VIIb protein secretion system (T7SSb) is found in Bacillota (firmicute) bacteria and has been shown to mediate interbacterial competition. EssC is a membrane-bound ATPase that is a critical component of the T7SSb and plays a key role in substrate recognition. Prior analysis of available genome sequences of the foodborne bacterial pathogen Listeria monocytogenes has shown that although the T7SSb was encoded as part of the core genome, EssC could be found as one of seven different sequence variants. While each sequence variant was associated with a specific suite of candidate substrate proteins encoded immediately downstream of essC, many LXG-domain proteins were encoded across multiple essC sequence variants. Here, we have extended this analysis using a diverse collection of 37 930 L . monocytogenes genomes. We have identified a rare eighth variant of EssC present in ten L. monocytogenes lineage III genomes. These genomes also encode a large toxin of the rearrangement hotspot (Rhs) repeat family adjacent to essC8, along with a probable immunity protein and three small accessory proteins. We have further identified nine novel LXG-domain proteins, and four additional chromosomal hotspots across L. monocytogenes genomes where LXG proteins can be encoded. The eight L. monocytogenes EssC variants were also found in other Listeria species, with additional novel EssC types also identified. Across the genus, species frequently encoded multiple EssC types, indicating that T7SSb diversity is a primary feature of the genus Listeria .
-
-
-
Comparative genomics uncovered differences between clinical and environmental populations of Vibrio parahaemolyticus in New Zealand
Vibrio parahaemolyticus has been identified as an emerging human pathogen worldwide with cases undergoing a global expansion over recent decades in phase with climate change. New Zealand had remained free of outbreaks until 2019, but different outbreaks have been reported consecutively since then. To provide new insights into the recent emergence of cases associated with outbreak clones over recent years, a comparative genomic study was carried out using a selection of clinical (mostly outbreak) and environmental isolates of V. parahaemolyticus obtained in New Zealand between 1973 and 2021. Among 151 isolates of clinical (n=60) and environmental (n=91) origin, 47 sequence types (STs) were identified, including 31 novel STs. The population of environmental isolates generated 30 novel STs, whereas only 1 novel ST (ST2658) was identified among the population of clinical isolates. The novel clinical ST was a single-locus variant of the pandemic ST36 strain, indicating further evolution of this pandemic strain. The environmental isolates exhibited a significant genetic heterogeneity compared to the clinical isolates. The whole-genome phylogeny separated the population of clinical isolates from their environmental counterparts, clearly indicating their distant genetic relatedness. In addition to differences in ancestral profiles and genetic relatedness, these two groups of isolates exhibited a profound difference in their virulence profiles. While the entire population of clinical isolates harboured the thermostable direct haemolysin (tdh) and/or the thermostable-related haemolysin (trh), only a few isolates of environmental origin possessed the same virulence genes. In contrast to tdh and trh, adhesin-encoding genes, vpadF and MSHA, showed a significantly (P<0.001) greater association with the environmental isolates compared to the clinical isolates. The effectors, VopQ, VPA0450 and VopS, which belong to T3SS1, were ubiquitous, being present in each isolate regardless of its origin. The effectors VopC and VopA, which belong to T3SS2, were rarely detected in any of the examined isolates. Our data indicate that the clinical and environmental isolates of V. parahaemolyticus from New Zealand differ in their population structures, ancestral profiles, genetic relatedness and virulence profiles. In addition, we identified numerous unique non-synonymous single-nucleotide polymorphisms (nsSNPs) in adhesins and effectors, exclusively associated with the clinical isolates tested, which may suggest a possible role of these mutations in the overall virulence of the clinical isolates.
-
-
-
Identification of further variation at the lipooligosaccharide outer core locus in Acinetobacter baumannii genomes and extension of the OCL reference sequence database for Kaptive
More LessThe outer core locus (OCL) that includes genes for the synthesis of the variable outer core region of the lipooligosaccharide (LOS) is one of the key epidemiological markers used for tracing the spread of Acinetobacter baumannii , a bacterial pathogen of global concern. In this study, we screened 12 476 publicly available A. baumannii genome assemblies for novel OCL sequences, detecting six new OCL types that were designated OCL17–OCL22. These were compiled with previously characterized OCL sequences to create an updated version of the A. baumannii OCL reference database, providing a total of 22 OCL reference sequences for use with the bioinformatics tool Kaptive. Use of this database against the 12 476 downloaded assemblies found OCL1 to be the most common locus, present in 73.6 % of sequenced genomes assigned by Kaptive with a match confidence score of good or above. OCL1 was most common amongst isolates belonging to sequence types (STs) ST1, ST2, ST3 and ST78, all of which are over-represented clonal lineages. The highest level of diversity in OCL types was found in ST2, with eight different OCLs identified. The updated OCL reference database is available for download from GitHub (https://github.com/klebgenomics/Kaptive; under version v. 2.0.5), and has been integrated for use on Kaptive-Web (https://kaptive-web.erc.monash.edu/) and PathogenWatch (https://pathogen.watch/), enhancing current methods for A. baumannii strain identification, classification and surveillance.
-
-
-
Enteric fever cluster identification in South Africa using genomic surveillance of Salmonella enterica serovar Typhi
The National Institute for Communicable Diseases in South Africa participates in national laboratory-based surveillance for human isolates of Salmonella species. Laboratory analysis includes whole-genome sequencing (WGS) of isolates. We report on WGS-based surveillance of Salmonella enterica serovar Typhi ( Salmonella Typhi) in South Africa from 2020 through 2021. We describe how WGS analysis identified clusters of enteric fever in the Western Cape Province of South Africa and describe the epidemiological investigations associated with these clusters. A total of 206 Salmonella Typhi isolates were received for analysis. Genomic DNA was isolated from bacteria and WGS was performed using Illumina NextSeq technology. WGS data were investigated using multiple bioinformatics tools, including those available at the Centre for Genomic Epidemiology, EnteroBase and Pathogenwatch. Core-genome multilocus sequence typing was used to investigate the phylogeny of isolates and identify clusters. Three major clusters of enteric fever were identified in the Western Cape Province; cluster one (n=11 isolates), cluster two (n=13 isolates), and cluster three (n=14 isolates). To date, no likely source has been identified for any of the clusters. All isolates associated with the clusters, showed the same genotype (4.3.1.1.EA1) and resistome (antimicrobial resistance genes: bla TEM-1B, catA1, sul1, sul2, dfrA7). The implementation of genomic surveillance of Salmonella Typhi in South Africa has enabled rapid detection of clusters indicative of possible outbreaks. Cluster identification allows for targeted epidemiological investigations and a timely, coordinated public health response.
-
-
-
A local-scale One Health genomic surveillance of Clostridioides difficile demonstrates highly related strains from humans, canines, and the environment
Although infections caused by Clostridioides difficile have historically been attributed to hospital acquisition, growing evidence supports the role of community acquisition in C. difficile infection (CDI). Symptoms of CDI can range from mild, self-resolving diarrhoea to toxic megacolon, pseudomembranous colitis, and death. In this study, we sampled C. difficile from clinical, environmental, and canine reservoirs in Flagstaff, Arizona, USA, to understand the distribution and transmission of the pathogen in a One Health framework; Flagstaff is a medium-sized, geographically isolated city with a single hospital system, making it an ideal site to characterize genomic overlap between sequenced C. difficile isolates across reservoirs. An analysis of 562 genomes from Flagstaff isolates identified 65 sequence types (STs), with eight STs being found across all three reservoirs and another nine found across two reservoirs. A screen of toxin genes in the pathogenicity locus identified nine STs where all isolates lost the toxin genes needed for CDI manifestation (tcdB, tcdA), demonstrating the widespread distribution of non-toxigenic C. difficile (NTCD) isolates in all three reservoirs; 15 NTCD genomes were sequenced from symptomatic, clinical samples, including two from mixed infections that contained both tcdB+ and tcdB- isolates. A comparative single nucleotide polymorphism (SNP) analysis of clinically derived isolates identified 78 genomes falling within clusters separated by ≤2 SNPs, indicating that ~19 % of clinical isolates are associated with potential healthcare-associated transmission clusters; only symptomatic cases were sampled in this study, and we did not sample asymptomatic transmission. Using this same SNP threshold, we identified genomic overlap between canine and soil isolates, as well as putative transmission between environmental and human reservoirs. The core genome of isolates sequenced in this study plus a representative set of public C. difficile genomes (n=136), was 2690 coding region sequences, which constitutes ~70 % of an individual C. difficile genome; this number is significantly higher than has been published in some other studies, suggesting that genome data quality is important in understanding the minimal number of genes needed by C. difficile . This study demonstrates the close genomic overlap among isolates sampled across reservoirs, which was facilitated by maximizing the genomic search space used for comprehensive identification of potential transmission events. Understanding the distribution of toxigenic and non-toxigenic C. difficile across reservoirs has implications for surveillance sampling strategies, characterizing routes of infections, and implementing mitigation measures to limit human infection.
-
- Evolution and Responses to Interventions
-
-
PanGraph: scalable bacterial pan-genome graph construction
More LessThe genomic diversity of microbes is commonly parameterized as SNPs relative to a reference genome of a well-characterized, but arbitrary, isolate. However, any reference genome contains only a fraction of the microbial pangenome, the total set of genes observed in a given species. Reference-based approaches are thus blind to the dynamics of the accessory genome, as well as variation within gene order and copy number. With the widespread usage of long-read sequencing, the number of high-quality, complete genome assemblies has increased dramatically. In addition to pangenomic approaches that focus on the variation in the sets of genes present in different genomes, complete assemblies allow investigations of the evolution of genome structure and gene order. This latter problem, however, is computationally demanding with few tools available that shed light on these dynamics. Here, we present PanGraph, a Julia-based library and command line interface for aligning whole genomes into a graph. Each genome is represented as a path along vertices, which in turn encapsulate homologous multiple sequence alignments. The resultant data structure succinctly summarizes population-level nucleotide and structural polymorphisms and can be exported into several common formats for either downstream analysis or immediate visualization.
-
-
-
Comparative genomics reveals intraspecific divergence of Acidithiobacillus ferrooxidans: insights from evolutionary adaptation
Acidithiobacillus ferrooxidans serves as a model chemolithoautotrophic organism in extremely acidic environments, which has attracted much attention due to its unique metabolism and strong adaptability. However, little was known about the divergences along the evolutionary process based on whole genomes. Herein, we isolated six strains of A. ferrooxidans from mining areas in China and Zambia, and used comparative genomics to investigate the intra-species divergences. The results indicated that A. ferrooxidans diverged into three groups from a common ancestor, and the pan-genome is ‘open’. The ancestral reconstruction of A. ferrooxidans indicated that genome sizes experienced a trend of increase in the very earliest days before a decreasing tendency during the evolutionary process, suggesting that both gene gain and gene loss played crucial roles in A. ferrooxidans genome flexibility. Meanwhile, 23 single-copy orthologous groups (OGs) were under positive selection. The differences of rusticyanin (Rus) sequences (the key protein in the iron oxidation pathway) and type IV secretion system (T4SS) composition in the A. ferrooxidans were both related to their group divergences, which contributed to their intraspecific diversity. This study improved our understanding of the divergent evolution and environmental adaptation of A. ferrooxidans at the genome level in extreme conditions, which provided theoretical support for the survival mechanism of living creatures at the extreme.
-
- Short Communications
-
- Genomic Methodologies
-
-
PfaSTer: a machine learning-powered serotype caller for Streptococcus pneumoniae genomes
More LessStreptococcus pneumoniae (pneumococcus) is a leading cause of morbidity and mortality worldwide. Although multi-valent pneumococcal vaccines have curbed the incidence of disease, their introduction has resulted in shifted serotype distributions that must be monitored. Whole genome sequence (WGS) data provide a powerful surveillance tool for tracking isolate serotypes, which can be determined from nucleotide sequence of the capsular polysaccharide biosynthetic operon (cps). Although software exists to predict serotypes from WGS data, most are constrained by requiring high-coverage next-generation sequencing reads. This can present a challenge in respect of accessibility and data sharing. Here we present PfaSTer, a machine learning-based method to identify 65 prevalent serotypes from assembled S. pneumoniae genome sequences. PfaSTer combines dimensionality reduction from k-mer analysis with a Random Forest classifier for rapid serotype prediction. By leveraging the model’s built-in statistical framework, PfaSTer determines confidence in its predictions without the need for coverage-based assessments. We then demonstrate the robustness of this method, returning >97 % concordance when compared to biochemical results and other in silico serotyping tools. PfaSTer is open source and available at: https://github.com/pfizer-opensource/pfaster.
-