-
Volume 10,
Issue 8,
2024
Volume 10, Issue 8, 2024
- Research Articles
-
- Genomic Methodologies
-
-
Development of the Pneumococcal Genome Library, a core genome multilocus sequence typing scheme, and a taxonomic life identification number barcoding system to investigate and define pneumococcal population structure
Investigating the genomic epidemiology of major bacterial pathogens is integral to understanding transmission, evolution, colonization, disease, antimicrobial resistance and vaccine impact. Furthermore, the recent accumulation of large numbers of whole genome sequences for many bacterial species enhances the development of robust genome-wide typing schemes to define the overall bacterial population structure and lineages within it. Using the previously published data, we developed the Pneumococcal Genome Library (PGL), a curated dataset of 30 976 genomes and contextual data for carriage and disease pneumococci recovered between 1916 and 2018 in 82 countries. We leveraged the size and diversity of the PGL to develop a core genome multilocus sequence typing (cgMLST) scheme comprised of 1222 loci. Finally, using multilevel single-linkage clustering, we stratified pneumococci into hierarchical clusters based on allelic similarity thresholds and defined these with a taxonomic life identification number (LIN) barcoding system. The PGL, cgMLST scheme and LIN barcodes represent a high-quality genomic resource and fine-scale clustering approaches for the analysis of pneumococcal populations, which support the genomic epidemiology and surveillance of this leading global pathogen.
-
- Functional Genomics and Microbe–Niche Interactions
-
-
Complete genome of the Medicago anthracnose fungus, Colletotrichum destructivum, reveals a mini-chromosome-like region within a core chromosome
Colletotrichum destructivum (Cd) is a phytopathogenic fungus causing significant economic losses on forage legume crops (Medicago and Trifolium species) worldwide. To gain insights into the genetic basis of fungal virulence and host specificity, we sequenced the genome of an isolate from Medicago sativa using long-read (PacBio) technology. The resulting genome assembly has a total length of 51.7 Mb and comprises ten core chromosomes and two accessory chromosomes, all of which were sequenced from telomere to telomere. A total of 15, 631 gene models were predicted, including genes encoding potentially pathogenicity-related proteins such as candidate-secreted effectors (484), secondary metabolism key enzymes (110) and carbohydrate-active enzymes (619). Synteny analysis revealed extensive structural rearrangements in the genome of Cd relative to the closely related Brassicaceae pathogen, Colletotrichum higginsianum. In addition, a 1.2 Mb species-specific region was detected within the largest core chromosome of Cd that has all the characteristics of fungal accessory chromosomes (transposon-rich, gene-poor, distinct codon usage), providing evidence for exchange between these two genomic compartments. This region was also unique in having undergone extensive intra-chromosomal segmental duplications. Our findings provide insights into the evolution of accessory regions and possible mechanisms for generating genetic diversity in this asexual fungal pathogen.
-
- Metagenomics and Microbiomes
-
-
Detection of novel orthoparamyxoviruses, orthonairoviruses and an orthohepevirus in European white-toothed shrews
While the viromes and immune systems of bats and rodents have been extensively studied, comprehensive data are lacking for insectivores (order Eulipotyphla) despite their wide geographic distribution. Anthropogenic land use and outdoor recreational activities, as well as changes in the range of shrews, may lead to an expansion of the human–shrew interface with the risk of spillover infections, as reported for Borna disease virus 1. We investigated the virome of 45 individuals of 4 white-toothed shrew species present in Europe, using metagenomic RNA sequencing of tissue and intestine pools. Moderate to high abundances of sequences related to the families Paramyxoviridae, Nairoviridae, Hepeviridae and Bornaviridae were detected. Whole genomes were determined for novel orthoparamyxoviruses (n=3), orthonairoviruses (n=2) and an orthohepevirus. The novel paramyxovirus, tentatively named Hasua virus, was phylogenetically related to the zoonotic Langya virus and Mòjiāng virus. The novel orthonairoviruses, along with the potentially zoonotic Erve virus, fall within the shrew-borne Thiafora virus genogroup. The highest viral RNA loads of orthoparamyxoviruses were detected in the kidneys, in well-perfused organs for orthonairoviruses and in the liver and intestine for orthohepevirus, indicating potential transmission routes. Notably, several shrews were found to be coinfected with viruses from different families. Our study highlights the virus diversity present in shrews, not only in biodiversity-rich regions but also in areas influenced by human activity. This study warrants further research to characterize and assess the clinical implications and risk of these viruses and the importance of shrews as reservoirs in European ecosystems.
-
-
-
Capturing clinically relevant Campylobacter attributes through direct whole genome sequencing of stool
Campylobacter is the leading bacterial cause of infectious intestinal disease, but the pathogen typically accounts for a very small proportion of the overall stool microbiome in each patient. Diagnosis is even more difficult due to the fastidious nature of Campylobacter in the laboratory setting. This has, in part, driven a change in recent years, from culture-based to rapid PCR-based diagnostic assays which have improved diagnostic detection, whilst creating a knowledge gap in our clinical and epidemiological understanding of Campylobacter genotypes – no isolates to sequence. In this study, direct metagenomic sequencing approaches were used to assess the possibility of replacing genome sequences with metagenome sequences; metagenomic sequencing outputs were used to describe clinically relevant attributes of Campylobacter genotypes. A total of 37 diarrhoeal stool samples with Campylobacter and five samples with an unknown pathogen result were collected and processed with and without filtration, DNA was extracted, and metagenomes were sequenced by short-read sequencing. Culture-based methods were used to validate Campylobacter metagenome-derived genome (MDG) results. Sequence output metrics were assessed for Campylobacter genome quality and accuracy of characterization. Of the 42 samples passing quality checks for analysis, identification of Campylobacter to the genus and species level was dependent on Campylobacter genome read count, coverage and genome completeness. A total of 65% (24/37) of samples were reliably identified to the genus level through Campylobacter MDG, 73% (27/37) by culture and 97% (36/37) by qPCR. The Campylobacter genomes with a genome completeness of over 60% (n=21) were all accurately identified at the species level (100%). Of those, 72% (15/21) were identified to sequence types (STs), and 95% (20/21) accurately identified antimicrobial resistance (AMR) gene determinants. Filtration of stool samples enhanced Campylobacter MDG recovery and genome quality metrics compared to the corresponding unfiltered samples, which improved the identification of STs and AMR profiles. The phylogenetic analysis in this study demonstrated the clustering of the metagenome-derived with culture-derived genomes and revealed the reliability of genomes from direct stool sequencing. Furthermore, Campylobacter genome spiking percentages ranging from 0 to 2% total metagenome abundance in the ONT MinION sequencer, configured to adaptive sequencing, exhibited better assembly quality and accurate identification of STs, particularly in the analysis of metagenomes containing 2 and 1% of Campylobacter jejuni genomes. Direct sequencing of Campylobacter from stool samples provides clinically relevant and epidemiologically important genomic information without the reliance on cultured genomes.
-
-
-
Phylogenetic diversity of putative nickel-containing carbon monoxide dehydrogenase-encoding prokaryotes in the human gut microbiome
More LessAlthough the production of carbon monoxide (CO) within the human body has been detected, only two CO-utilizing prokaryotes (CO utilizers) have been reported in the human gut. Therefore, the phylogenetic diversity of the human gut CO-utilizing prokaryotes remains unclear. Here, we unveiled more than a thousand representative genomes containing genes for putative nickel-containing CO dehydrogenase (pCODH), an essential enzyme for CO utilization. The taxonomy of genomes encoding pCODH was expanded to include 8 phyla, comprising 82 genera and 248 species. In contrast, putative molybdenum-containing CODH genes were not detected in the human gut microbial genomes. pCODH transcripts were detected in 97.3 % (n=110) of public metatranscriptome datasets derived from healthy human faeces, suggesting the ubiquitous presence of prokaryotes bearing transcriptionally active pCODH genes in the human gut. More than half of the pCODH-encoding genomes contain a set of genes for the autotrophic Wood–Ljungdahl pathway (WLP). However, 79 % of these genomes commonly lack a key gene for the WLP, which encodes the enzyme that synthesizes formate from CO2, suggesting that potential human gut CO-utilizing prokaryotes share a degenerated gene set for WLP. In the other half of the pCODH-encoding genomes, seven genes, including putative genes for flavin adenine dinucleotide-dependent NAD(P) oxidoreductase (FNOR), ABC transporter and Fe-hydrogenase, were found adjacent to the pCODH gene. None of the putative genes associated with CO-oxidizing respiratory machinery, such as energy-converting hydrogenase genes, were found in pCODH-encoding genomes. This suggests that the human gut CO utilization is not for CO removal, but potentially for fixation and/or biosynthesis, consistent with the harmless yet continuous production of CO in the human gut. Our findings reveal the diversity and distribution of prokaryotes with pCODH in the human gut microbiome, suggesting their potential contribution to microbial ecosystems in human gut environments.
-
- Pathogens and Epidemiology
-
-
Whole-genome sequencing of Western Canadian Borrelia spp. collected from diverse tick and animal hosts reveals short-lived local genotypes interspersed with longer-lived continental genotypes
Changing climates are allowing the geographic expansion of ticks and their animal hosts, increasing the risk of Borrelia-caused zoonoses in Canada. However, little is known about the genomic diversity of Borrelia from the west of the Canadian Rockies and from the tick vectors Ixodes pacificus, Ixodes auritulus and Ixodes angustus. Here, we report the whole-genome shotgun sequences of 51 Borrelia isolates from multiple tick species collected on a range of animal hosts between 1993 and 2016, located primarily in coastal British Columbia. The bacterial isolates represented three different species from the Lyme disease-causing Borrelia burgdorferi sensu lato genospecies complex [Borrelia burgdorferi sensu stricto (n=47), Borrelia americana (n=3) and Borrelia bissettiae (n=1)]. The traditional eight-gene multi-locus sequence typing (MLST) strategy was applied to facilitate comparisons across studies. This identified 13 known Borrelia sequence types (STs), established 6 new STs, and assigned 5 novel types to the nearest sequence types. B. burgdorferi s. s. isolates were further differentiated into ten ospC types, plus one novel ospC with less than 92 % nucleotide identity to all previously defined ospC types. The MLST types resampled over extended time periods belonged to previously described STs that are distributed across North America. The most geographically widespread ST, ST.12, was isolated from all three tick species. Conversely, new B. burgdorferi s. s. STs from Vancouver Island and the Vancouver region were only detected for short periods, revealing a surprising transience in space, time and host tick species, possibly due to displacement by longer-lived genotypes that expanded across North America.
This article contains data hosted by Microreact.
-
-
-
Molecular characterization of Streptococcus pyogenes (StrepA) non-invasive isolates during the 2022–2023 UK upsurge
At the end of 2022 into early 2023, the UK Health Security Agency reported unusually high levels of scarlet fever and invasive disease caused by Streptococcus pyogenes (StrepA or group A Streptococcus). During this time, we collected and genome-sequenced 341 non-invasive throat and skin S. pyogenes isolates identified during routine clinical diagnostic testing in Sheffield, a large UK city. We compared the data with that obtained from a similar collection of 165 isolates from 2016 to 2017. Numbers of throat-associated isolates collected peaked in early December 2022, reflecting the national scarlet fever upsurge, while skin infections peaked later in December. The most common emm-types in 2022–2023 were emm1 (28.7 %), emm12 (24.9 %) and emm22 (7.7 %) in throat and emm1 (22 %), emm12 (10 %), emm76 (18 %) and emm49 (7 %) in skin. While all emm1 isolates were the M1UK lineage, the comparison with 2016–2017 revealed diverse lineages in other emm-types, including emm12, and emergent lineages within other types including a new acapsular emm75 lineage, demonstrating that the upsurge was not completely driven by a single genotype. The analysis of the capsule locus predicted that only 51 % of throat isolates would produce capsule compared with 78% of skin isolates. Ninety per cent of throat isolates were also predicted to have high NADase and streptolysin O (SLO) expression, based on the promoter sequence, compared with only 56% of skin isolates. Our study has highlighted the value in analysis of non-invasive isolates to characterize tissue tropisms, as well as changing strain diversity and emerging genomic features which may have implications for spillover into invasive disease and future S. pyogenes upsurges.
-
-
-
Comparison of gene-by-gene and genome-wide short nucleotide sequence-based approaches to define the global population structure of Streptococcus pneumoniae
Defining the population structure of a pathogen is a key part of epidemiology, as genomically related isolates are likely to share key clinical features such as antimicrobial resistance profiles and invasiveness. Multiple different methods are currently used to cluster together closely related genomes, potentially leading to inconsistency between studies. Here, we use a global dataset of 26 306 Streptococcus pneumoniae genomes to compare four clustering methods: gene-by-gene seven-locus MLST, core genome MLST (cgMLST)-based hierarchical clustering (HierCC) assignments, life identification number (LIN) barcoding and k-mer-based PopPUNK clustering (known as GPSCs in this species). We compare the clustering results with phylogenetic and pan-genome analyses to assess their relationship with genome diversity and evolution, as we would expect a good clustering method to form a single monophyletic cluster that has high within-cluster similarity of genomic content. We show that the four methods are generally able to accurately reflect the population structure based on these metrics and that the methods were broadly consistent with each other. We investigated further to study the discrepancies in clusters. The greatest concordance was seen between LIN barcoding and HierCC (adjusted mutual information score=0.950), which was expected given that both methods utilize cgMLST, but have different methods for defining an individual cluster and different core genome schema. However, the existence of differences between the two methods shows that the selection of a core genome schema can introduce inconsistencies between studies. GPSC and HierCC assignments were also highly concordant (AMI=0.946), showing that k-mer-based methods which use the whole genome and do not require the careful selection of a core genome schema are just as effective at representing the population structure. Additionally, where there were differences in clustering between these methods, this could be explained by differences in the accessory genome that were not identified in cgMLST. We conclude that for S. pneumoniae, standardized and stable nomenclature is important as the number of genomes available expands. Furthermore, the research community should transition away from seven-locus MLST, whilst cgMLST, GPSC and LIN assignments should be used more widely. However, to allow for easy comparison between studies and to make previous literature relevant, the reporting of multiple clustering names should be standardized within the research.
-
-
-
Comparative genomic analysis identifies potential adaptive variation in Mycoplasma ovipneumoniae
Kimberly R. Andrews, Thomas E. Besser, Thibault Stalder, Eva M. Top, Katherine N. Baker, Matthew W. Fagnan, Daniel D. New, G. Maria Schneider, Alexandra Gal, Rebecca Andrews-Dickert, Samuel S. Hunter, Kimberlee B. Beckmen, Lauren Christensen, Anne Justice-Allen, Denise Konetchy, Chadwick P. Lehman, Kezia Manlove, Hollie Miyasaki, Todd Nordeen, Annette Roug and E. Frances CassirerMycoplasma ovipneumoniae is associated with respiratory disease in wild and domestic Caprinae globally, with wide variation in disease outcomes within and between host species. To gain insight into phylogenetic structure and mechanisms of pathogenicity for this bacterial species, we compared M. ovipneumoniae genomes for 99 samples from 6 countries (Australia, Bosnia and Herzegovina, Brazil, China, France and USA) and 4 host species (domestic sheep, domestic goats, bighorn sheep and caribou). Core genome sequences of M. ovipneumoniae assemblies from domestic sheep and goats fell into two well-supported phylogenetic clades that are divergent enough to be considered different bacterial species, consistent with each of these two clades having an evolutionary origin in separate host species. Genome assemblies from bighorn sheep and caribou also fell within these two clades, indicating multiple spillover events, most commonly from domestic sheep. Pangenome analysis indicated a high percentage (91.4 %) of accessory genes (i.e. genes found only in a subset of assemblies) compared to core genes (i.e. genes found in all assemblies), potentially indicating a propensity for this pathogen to adapt to within-host conditions. In addition, many genes related to carbon metabolism, which is a virulence factor for Mycoplasmas, showed evidence for homologous recombination, a potential signature of adaptation. The presence or absence of annotated genes was very similar between sheep and goat clades, with only two annotated genes significantly clade-associated. However, three M. ovipneumoniae genome assemblies from asymptomatic caribou in Alaska formed a highly divergent subclade within the sheep clade that lacked 23 annotated genes compared to other assemblies, and many of these genes had functions related to carbon metabolism. Overall, our results suggest that adaptation of M. ovipneumoniae has involved evolution of carbon metabolism pathways and virulence mechanisms related to those pathways. The genes involved in these pathways, along with other genes identified as potentially involved in virulence in this study, are potential targets for future investigation into a possible genomic basis for the high variation observed in disease outcomes within and between wild and domestic host species.
-
-
-
Development and implementation of a core genome multilocus sequence typing scheme for Haemophilus influenzae
Haemophilus influenzae is part of the human nasopharyngeal microbiota and a pathogen causing invasive disease. The extensive genetic diversity observed in H. influenzae necessitates discriminatory analytical approaches to evaluate its population structure. This study developed a core genome multilocus sequence typing (cgMLST) scheme for H. influenzae using pangenome analysis tools and validated the cgMLST scheme using datasets consisting of complete reference genomes (N = 14) and high-quality draft H. influenzae genomes (N = 2297). The draft genome dataset was divided into a development dataset (N = 921) and a validation dataset (N = 1376). The development dataset was used to identify potential core genes, and the validation dataset was used to refine the final core gene list to ensure the reliability of the proposed cgMLST scheme. Functional classifications were made for all the resulting core genes. Phylogenetic analyses were performed using both allelic profiles and nucleotide sequence alignments of the core genome to test congruence, as assessed by Spearman’s correlation and ordinary least square linear regression tests. Preliminary analyses using the development dataset identified 1067 core genes, which were refined to 1037 with the validation dataset. More than 70% of core genes were predicted to encode proteins essential for metabolism or genetic information processing. Phylogenetic and statistical analyses indicated that the core genome allelic profile accurately represented phylogenetic relatedness among the isolates (R 2 = 0.945). We used this cgMLST scheme to define a high-resolution population structure for H. influenzae, which enhances the genomic analysis of this clinically relevant human pathogen.
-
Most Read This Month
