-
Volume 11,
Issue 1,
2025
Volume 11, Issue 1, 2025
- Research Articles
-
- Genomic Methodologies
-
-
Exploring SNP filtering strategies: the influence of strict vs soft core
More LessPhylogenetic analyses are crucial for understanding microbial evolution and infectious disease transmission. Bacterial phylogenies are often inferred from SNP alignments, with SNPs as the fundamental signal within these data. SNP alignments can be reduced to a ‘strict core’ by removing those sites that do not have data present in every sample. However, as sample size and genome diversity increase, a strict core can shrink markedly, discarding potentially informative data. Here, we propose and provide evidence to support the use of a ‘soft core’ that tolerates some missing data, preserving more information for phylogenetic analysis. Using large datasets of Neisseria gonorrhoeae and Salmonella enterica serovar Typhi, we assess different core thresholds. Our results show that strict cores can drastically reduce informative sites compared to soft cores. In a 10 000-genome alignment of Salmonella enterica serovar Typhi, a 95% soft core yielded ten times more informative sites than a 100% strict core. Similar patterns were observed in N. gonorrhoeae. We further evaluated the accuracy of phylogenies built from strict- and soft-core alignments using datasets with strong temporal signals. Soft-core alignments generally outperformed strict cores in producing trees displaying clock-like behaviour; for instance, the N. gonorrhoeae 95% soft-core phylogeny had a root-to-tip regression R 2 of 0.50 compared to 0.21 for the strict-core phylogeny. This study suggests that soft-core strategies are preferable for large, diverse microbial datasets. To facilitate this, we developed Core-SNP-filter (https://github.com/rrwick/Core-SNP-filter), an open-source software tool for generating soft-core alignments from whole-genome alignments based on user-defined thresholds.
-
- Functional Genomics and Microbe–Niche Interactions
-
-
The dominant lineage of an emerging pathogen harbours contact-dependent inhibition systems
More LessBacteria from the Stenotrophomonas maltophilia complex (Smc) are important multidrug-resistant pathogens that cause a broad range of infections. Smc is genomically diverse and has been classified into 23 lineages. Lineage Sm6 is the most common among sequenced strains, but it is unclear why this lineage has evolved to be dominant. Antagonistic interactions can significantly affect the evolution of bacterial populations. These interactions may be mediated by secreted contact-dependent proteins, which allow inhibitor cells to intoxicate adjacent target bacteria. Contact-dependent inhibition (CDI) requires three proteins: CdiA, CdiB and CdiI. CdiA is a large, filamentous protein exported to the surface of inhibitor cells through the pore-like CdiB. The CdiA C-terminal domain (CdiA-CT) is toxic when delivered into target cells of the same species or genus. CdiI immunity proteins neutralize the toxicity of cognate CdiA-CT toxins. We found that all complete Smc genomes from the Sm6 lineage harbour at least one CDI locus. By contrast, less than a quarter of strains from other lineages have CDI genes. Smc CdiA-CT domains are diverse and have a broad range of predicted functions. Most Sm6 strains harbour non-cognate cdiI genes predicted to provide protection against foreign toxins from other strains. Finally, we demonstrated that an Smc CdiA-CT toxin has antibacterial properties and is neutralized by its cognate CdiI.
-
-
-
Shigella sonnei and Shigella flexneri infection in Caenorhabditis elegans led to species-specific regulatory responses in the host and pathogen
More LessIn recent decades, Shigella sonnei has surpassed Shigella flexneri as the leading cause of shigellosis, possibly due to species-specific differences in their transcriptomic responses. This study used dual RNA sequencing to analyse the transcriptomic responses of Caenorhabditis elegans and the two Shigella species at early (10 minutes) and late (24 hours) stages of infection. While the nematode defence response was downregulated during both Shigella infections, only infection by S. sonnei led to downregulation of sphingolipid metabolism, cadmium ion response and xenobiotic response in C. elegans. Furthermore, S. sonnei upregulates biofilm formation and energy generation/conservation during infection, acid resistance-related genes and biofilm regulators compared to S. flexneri. These findings highlight species-specific responses during C. elegans infection.
-
- Metagenomics and Microbiomes
-
-
Impact of simulation and reference catalogues on the evaluation of taxonomic profiling pipelines
More LessMicrobiome profiling tools rely on reference catalogues, which significantly affect their performance. Comparing them is, however, challenging, mainly due to differences in their native catalogues. In this study, we present a novel standardized benchmarking framework that makes such comparisons more accurate. We decided not to customize databases but to translate results to a common reference to use the tools with their native environment. Specifically, we conducted two realistic simulations of gut microbiome samples, each based on a specific taxonomic profiler, and used two different taxonomic references to project their results, namely the Genome Taxonomy Database and the Unified Human Gastrointestinal Genome. To demonstrate the importance of using such a framework, we evaluated four established profilers as well as the impact of the simulations and that of the common taxonomic references on the perceived performance of these profilers. Finally, we provide guidelines to enhance future profiler comparisons for human microbiome ecosystems: (i) use or create realistic simulations tailored to your biological context (BC), (ii) identify a common feature space suited to your BC and independent of the catalogues used by the profilers and (iii) apply a comprehensive set of metrics covering accuracy (sensitivity/precision), overall representativity (richness/Shannon) and quantification (UniFrac and/or Aitchison distance).
-
-
-
Ecological insights into the microbiology of food using metagenomics and its potential surveillance applications
More LessA diverse array of micro-organisms can be found on food, including those that are pathogenic or resistant to antimicrobial drugs. Metagenomics involves extracting and sequencing the DNA of all micro-organisms on a sample, and here, we used a combination of culture and culture-independent approaches to investigate the microbial ecology of food to assess the potential application of metagenomics for the microbial surveillance of food. We cultured common foodborne pathogens and other organisms including Escherichia coli, Klebsiella/Raoultella spp., Salmonella spp. and Vibrio spp. from five different food commodities and compared their genomes to the microbial communities obtained by metagenomic sequencing following host (food) DNA depletion. The microbial populations of retail food were found to be predominated by psychrotrophic bacteria, driven by the cool temperatures in which the food products are stored. Pathogens accounted for a small percentage of the food metagenome compared to the psychrotrophic bacteria, and cultured pathogens were inconsistently identified in the metagenome data. The microbial composition of food varied amongst different commodities, and metagenomics was able to classify the taxonomic origin of 59% of antimicrobial resistance genes (ARGs) found on food to the genus level, but it was unclear what percentage of ARGs were associated with mobile genetic elements and thus transferable to other bacteria. Metagenomics may be used to survey the ARG burden, composition and carriage on foods to which consumers are exposed. However, food metagenomics, even after depleting host DNA, inconsistently identifies pathogens without enrichment or further bait capture.
-
-
-
An ecological-evolutionary perspective on the genomic diversity and habitat preferences of the Acidobacteriota
More LessMembers of the phylum Acidobacteriota inhabit a wide range of ecosystems including soils. We analysed the global patterns of distribution and habitat preferences of various Acidobacteriota lineages across major ecosystems (soil, engineered, host-associated, marine, non-marine saline and alkaline and terrestrial non-soil ecosystems) in 248 559 publicly available metagenomic datasets. Classes Terriglobia, Vicinamibacteria, Blastocatellia and Thermoanaerobaculia were highly ubiquitous and showed a clear preference to soil over non-soil habitats, while classes Aminicenantia and Holophagae showed preferences to non-soil habitats. However, while specific preferences were observed, most Acidobacteriota lineages were habitat generalists rather than specialists, with genomic and/or metagenomic fragments recovered from soil and non-soil habitats at various levels of taxonomic resolution. Comparative analysis of 1930 genomes strongly indicates that phylogenetic affiliation plays a more important role than the habitat from which the genome was recovered in shaping the genomic characteristics and metabolic capacities of the Acidobacteriota. The observed lack of strong habitat specialization and habitat-transition-driven lineage evolution in the Acidobacteriota suggest ready cross-colonization between soil and non-soil habitats. We posit that such capacity is key to the successful establishment of Acidobacteriota as a major component in soil microbiomes post-ecosystem disturbance events or during pedogenesis.
-
- Pathogens and Epidemiology
-
-
Evaluation of nationwide analysis surveillance for methicillin-resistant Staphylococcus aureus within Genomic Medicine Sweden
More LessBackground. National epidemiological investigations of microbial infections greatly benefit from the increased information gained by whole-genome sequencing (WGS) in combination with standardized approaches for data sharing and analysis.
Aim. To evaluate the quality and accuracy of WGS data generated by different laboratories but analysed by joint pipelines to reach a national surveillance approach.
Methods. A national methicillin-resistant Staphylococcus aureus (MRSA) collection of 20 strains was distributed to nine participating laboratories that performed in-house procedures for WGS. Raw data were shared and analysed by three pipelines: 1928 Diagnostics, JASEN (GMS pipeline) and CLC-Genomics Workbench. The outcomes were compared according to quality, correct strain identification and genetic distances.
Results. One isolate contained intraspecies contamination and was excluded from further analysis. The mean sequencing depth varied between sites and technologies. However, all analysis methods identified 12 strains that belonged to one of five outbreak clusters. The cut-off definition was set to <10 allele differences for core genome multilocus sequence typing (cgMLST) and <20 genetic differences for SNP analysis in a pairwise comparison.
Conclusions. MRSA isolates, which are whole genome sequenced by different laboratories and analysed using the same bioinformatic pipelines, yielded comparable results for outbreak clustering for both cgMLST and SNP, using the 1928 analysis pipeline. In this study, JASEN was best suited to analyse Illumina data and CLC to analyse within respective technology. In the future, real-time sharing of data and harmonized analysis within the Genomic Medicine Sweden consortium will further facilitate investigations of outbreaks and transmission routes.
-
-
-
Diversification of bla OXA-48-harbouring plasmids among carbapenemase-producing Enterobacterales, 11 years after a large outbreak in a general hospital in the Netherlands
More LessIntroduction. Genes encoding OXA-48-like carbapenem-hydrolyzing enzymes are often located on plasmids and are abundant among carbapenemase-producing Enterobacterales (CPE) worldwide. After a large bla OXA-48 plasmid-mediated outbreak in 2011, routine screening of patients at risk of CPE carriage on admission and every 7 days during hospitalization was implemented in a large hospital in the Netherlands. The objective of this study was to investigate the dynamics of the hospitals’ 2011 outbreak-associated bla OXA-48 plasmid among CPE collected from 2011 to 2021.
Methods. A selection of 86 bla OXA-48-carrying CPE isolates was made from 374 isolates collected over an 11-year study period. Species included Escherichia coli (Eco), Klebsiella pneumoniae (Kpn), Enterobacter cloacae complex (Ecl), Citrobacter freundii (Cfr), Citrobacter koseri (Cko) and Morganella morgani (Mmo). Short-read sequencing was combined with long-read sequencing for all isolates to reconstruct bla OXA-48-like plasmids and chromosomes of CPE. MASH, MOBsuite, ResFinder, PlasmidFinder and SNP analyses were performed to study diversity. pOXA-48 plasmids were compared to plasmid sequences that were sequenced for the Dutch CPE surveillance in the same time period.
Results. In total for the 86 CPE, 2 failed genomic assemblies and 78 bla OXA-48-encoding plasmids were reconstructed, and six bla OXA-48 genes were located chromosomally. The 2011 outbreak-associated bla OXA-48 plasmid of 63.6 kb with IncL replicon was found in Cfr, Ecl, Eco, Kpn and Mmo and primarily between 2011 and 2014 and indicated as LR025105 as MASH nearest neighbour. From 2014 onwards, 11 other types of bla OXA-48-carrying plasmids with different antibiotic-resistant genes and replicons were discovered, representing the earlier defined distinct pOXA-48 plasmid groups found in the Netherlands. Furthermore, on a national level, the LR025105 plasmid was found after 2015 in many different bacterial backgrounds, highlighting the promiscuous nature of this pOXA-48 plasmid.
Conclusion. After a large bla OXA-48 outbreak in a large hospital in the Netherlands, the composition of the bla OXA-48 plasmid population in this hospital diversified over time and is in line with national surveillance data. Plasmid sequencing provided valuable insight into the transmission dynamics of bla OXA-48-encoding plasmids and showed no indication of the persistence of the 2011 bla OXA-48 plasmid in the hospital environment.
-
-
-
Multiple introductions of NRCS-A Staphylococcus capitis to the neonatal intensive care unit drive neonatal bloodstream infections: a case-control and environmental genomic survey
More LessBackground. The Staphylococcus capitis NRCS-A strain has emerged as a global cause of late-onset sepsis associated with outbreaks in neonatal intensive care units (NICUs) whose transmission is incompletely understood.
Methods. Demographic and clinical data for 45 neonates with S. capitis and 90 with other coagulase-negative staphylococci (CoNS) isolated from sterile sites were reviewed, and clinical significance was determined. S. capitis isolated from 27 neonates at 2 hospitals between 2017 and 2022 underwent long-read (ONT) (n=27) and short-read (Illumina) sequencing (n=18). These sequences were compared with S. capitis sequenced from blood culture isolates from other adult and paediatric patients in the same hospitals (n=6), S. capitis isolated from surface swabs (found in 5/150 samples), rectal swabs (in 2/69 samples) in NICU patients and NICU environmental samples (in 5/114 samples). Reads from all samples were mapped to a hybrid assembly of a local sterile site strain, forming a complete UK NRCS-A reference genome, for outbreak analysis and comparison with 826 other S. capitis from the UK and Germany.
Results. S. capitis bacteraemia was associated with increased length of NICU stay at sampling (median day 22 vs day 12 for other CoNS isolated; P=0.05). A phylogeny of sequenced S. capitis revealed a cluster comprised of 25/27 neonatal sterile site isolates and 3/5 superficial, 2/2 rectal and 1/5 environmental isolates. No isolates from other wards belonged to this cluster. Phylogenetic comparison with published sequences confirmed that the cluster was NRCS-A outbreak strain but found a relatively high genomic diversity (mean pairwise distance of 84.9 SNPs) and an estimated NRCS-A S. capitis molecular clock of 5.1 SNPs/genome/year (95% credibility interval 4.3–5.9). The presence of S. capitis in superficial cultures did not correlate with neonatal bacteraemia, but both neonates with rectal NRCS-A S. capitis carriage identified also experienced S. capitis bacteraemia.
Conclusions. S. capitis bacteraemia occurred in patients with longer NICU admission than other CoNS. Genomic analysis confirms clinically significant infections with the NRCS-A S. capitis strain, distinct from non-NICU clinical samples. Multiple introductions of S. capitis, rather than prolonged environmental persistence, were seen over 5 years of infections.
-
-
-
Genomic epidemiology of extended-spectrum beta-lactamase-producing Escherichia coli from humans and a river in Aotearoa New Zealand
More LessIn Aotearoa New Zealand, urinary tract infections in humans are commonly caused by extended-spectrum beta-lactamase (ESBL)-producing Escherichia coli. This group of antimicrobial-resistant bacteria are often multidrug resistant. However, there is limited information on ESBL-producing E. coli found in the environment and their link with human clinical isolates. In this study, we examined the genetic relationship between environmental and human clinical ESBL-producing E. coli and isolates collected in parallel within the same area over 14 months. Environmental samples were collected from treated effluent, stormwater and multiple locations along an Aotearoa New Zealand river. Treated effluent, stormwater and river water sourced downstream of the treated effluent outlet were the main samples that were positive for ESBL-producing E. coli (7/14 samples, 50.0%; 3/6 samples, 50%; and 15/28 samples, 54%, respectively). Whole-genome sequence comparison was carried out on 307 human clinical and 45 environmental ESBL-producing E. coli isolates. Sequence type 131 was dominant for both clinical (147/307, 47.9%) and environmental isolates (11/45, 24.4%). Only one ESBL gene was detected in each isolate. Among the clinical isolates, the most prevalent ESBL genes were bla CTX-M-27 (134/307, 43.6%) and bla CTX-M-15 (134/307, 43.6%). Among the environmental isolates, bla CTX-M-15 (28/45, 62.2%) was the most prevalent gene. A core SNP analysis of these isolates suggested that some strains were shared between humans and the local river. These results highlight the importance of understanding different transmission pathways for the spread of ESBL-producing E. coli.
-
-
-
Clostridioides difficile recovered from hospital patients, livestock and dogs in Nigeria share near-identical genome sequences
More LessGenomic data on Clostridioides difficile from the African continent are currently lacking, resulting in the region being under-represented in global analyses of C. difficile infection (CDI) epidemiology. For the first time in Nigeria, we utilized whole-genome sequencing and phylogenetic tools to compare C. difficile isolates from diarrhoeic human patients (n=142), livestock (n=38), poultry manure (n=5) and dogs (n=9) in the same geographic area (Makurdi, north-central Nigeria) and relate them to the global C. difficile population. In addition, selected isolates were tested for antimicrobial susceptibility (n=33) and characterized by PCR ribotyping (n=53). Hierarchical clustering of core-genome multilocus sequence typing (cgMLST) allelic profiles revealed large diversity at the level HC150 (i.e. clusters of related genomes with maximally 150 pairwise allelic differences), which was previously shown to correlate with PCR ribotypes (RT). While several globally disseminated strains were detected, including HC150_1 (associated with RT078), HC150_3 (RT001) and HC150_3622 (RT014), 42 HC150 clusters (79%) represented unique genotypes that were new to the public genomic record, and 16 (30%) of these were novel PCR ribotypes. Considerable proportions of the C. difficile isolates displayed resistance to fluoroquinolones, macrolides and linezolid, potentially reflecting human and animal antibiotic consumption patterns in the region. Notably, our comparative phylogenomic analyses revealed human–human, human–livestock and farm–farm sharing of near-identical C. difficile genomes (≤2 core-genome allelic differences), suggesting the continued spread of multiple strains across human and animal (pig, poultry, cattle and dog) host populations. Our findings highlight the interconnectivity between livestock production and the epidemiology of human CDI and inform the need for increased CDI awareness among clinicians in this region. A large proportion of C. difficile strains appeared to be unique to the region, reflecting both the significant geographic patterning present in the C. difficile population and a general need for additional pathogen sequencing data from Africa.
-
- Evolution and Responses to Interventions
-
-
Evolution and spread of Xanthomonas citri subsp. citri in the São Paulo, Brazil, citrus belt inferred from 758 novel genomes
More LessThe São Paulo state citrus belt in Brazil is a major citrus production region. Since at least 1957, citrus plantations in this region have been affected by citrus canker, an economically damaging disease caused by Xanthomonas citri subsp. citri (Xcc). For about 50 years, until 2017, a citrus canker eradication programme was carried out in this region. In this work, our aim was to investigate the effects of the eradication programme on genetic variability and evolution of Xcc. To this end, we sequenced and analysed 758 Xcc genomes sampled in the São Paulo citrus belt, together with 730 publicly available Xcc genomes from around the world. Our phylogenomic analyses show that these genomes can be grouped into seven major lineages and that in São Paulo, lineage L7 is dominant. Our time estimate for its appearance closely matches the date when citrus production expanded. L7 can be subdivided into lineages L7.1 and L7.2. In our samples, L7.2, which we estimate to have emerged around 1964, is by far the most abundant, showing that the eradication programme had little impact on strain diversification. On the other hand, oscillations in the estimated effective population size of L7.2 strains over time closely match the shifts in the eradication programme. In sum, we present a detailed view of the genomic diversity of Xcc in the world and in São Paulo, the largest such effort in terms of a number of genomes for a crop pathogen undertaken so far. The methods employed here can form the basis for active genomic surveillance of Xcc in major citrus production areas.
-
Most Read This Month