-
Volume 10,
Issue 6,
2024
Volume 10, Issue 6, 2024
- Bioresources
-
- Genomic Methodologies
-
-
PHA4GE quality control contextual data tags: standardized annotations for sharing public health sequence datasets with known quality issues to facilitate testing and training
Emma J. Griffiths, Inês Mendes, Finlay Maguire, Jennifer L. Guthrie, Bryan A. Wee, Sarah Schmedes, Kathryn Holt, Chanchal Yadav, Rhiannon Cameron, Charlotte Barclay, Damion Dooley, Duncan MacCannell, Leonid Chindelevitch, Ilene Karsch-Mizrachi, Zahra Waheed, Lee Katz, Robert Petit III, Mugdha Dave, Paul Oluniyi, Muhammad Ibtisam Nasar, Amogelang Raphenya, William W. L. Hsiao and Ruth E. TimmeAs public health laboratories expand their genomic sequencing and bioinformatics capacity for the surveillance of different pathogens, labs must carry out robust validation, training, and optimization of wet- and dry-lab procedures. Achieving these goals for algorithms, pipelines and instruments often requires that lower quality datasets be made available for analysis and comparison alongside those of higher quality. This range of data quality in reference sets can complicate the sharing of sub-optimal datasets that are vital for the community and for the reproducibility of assays. Sharing of useful, but sub-optimal datasets requires careful annotation and documentation of known issues to enable appropriate interpretation, avoid being mistaken for better quality information, and for these data (and their derivatives) to be easily identifiable in repositories. Unfortunately, there are currently no standardized attributes or mechanisms for tagging poor-quality datasets, or datasets generated for a specific purpose, to maximize their utility, searchability, accessibility and reuse. The Public Health Alliance for Genomic Epidemiology (PHA4GE) is an international community of scientists from public health, industry and academia focused on improving the reproducibility, interoperability, portability, and openness of public health bioinformatic software, skills, tools and data. To address the challenges of sharing lower quality datasets, PHA4GE has developed a set of standardized contextual data tags, namely fields and terms, that can be included in public repository submissions as a means of flagging pathogen sequence data with known quality issues, increasing their discoverability. The contextual data tags were developed through consultations with the community including input from the International Nucleotide Sequence Data Collaboration (INSDC), and have been standardized using ontologies - community-based resources for defining the tag properties and the relationships between them. The standardized tags are agnostic to the organism and the sequencing technique used and thus can be applied to data generated from any pathogen using an array of sequencing techniques. The tags can also be applied to synthetic (lab created) data. The list of standardized tags is maintained by PHA4GE and can be found at https://github.com/pha4ge/contextual_data_QC_tags. Definitions, ontology IDs, examples of use, as well as a JSON representation, are provided. The PHA4GE QC tags were tested, and are now implemented, by the FDA’s GenomeTrakr laboratory network as part of its routine submission process for SARS-CoV-2 wastewater surveillance. We hope that these simple, standardized tags will help improve communication regarding quality control in public repositories, in addition to making datasets of variable quality more easily identifiable. Suggestions for additional tags can be submitted to PHA4GE via the New Term Request Form in the GitHub repository. By providing a mechanism for feedback and suggestions, we also expect that the tags will evolve with the needs of the community.
-
- Metagenomics and Microbiomes
-
-
GROND: a quality-checked and publicly available database of full-length 16S-ITS-23S rRNA operon sequences
Sequence comparison of 16S rRNA PCR amplicons is an established approach to taxonomically identify bacterial isolates and profile complex microbial communities. One potential application of recent advances in long-read sequencing technologies is to sequence entire rRNA operons and capture significantly more phylogenetic information compared to sequencing of the 16S rRNA (or regions thereof) alone, with the potential to increase the proportion of amplicons that can be reliably classified to lower taxonomic ranks. Here we describe GROND (Genome-derived Ribosomal Operon Database), a publicly available database of quality-checked 16S-ITS-23S rRNA operons, accompanied by multiple taxonomic classifications. GROND will aid researchers in analysis of their data and act as a standardised database to allow comparison of results between studies.
-
- Research Articles
-
- Genomic Methodologies
-
-
How low can you go? Short-read polishing of Oxford Nanopore bacterial genome assemblies
It is now possible to assemble near-perfect bacterial genomes using Oxford Nanopore Technologies (ONT) long reads, but short-read polishing is usually required for perfection. However, the effect of short-read depth on polishing performance is not well understood. Here, we introduce Pypolca (with default and careful parameters) and Polypolish v0.6.0 (with a new careful parameter). We then show that: (1) all polishers other than Pypolca-careful, Polypolish-default and Polypolish-careful commonly introduce false-positive errors at low read depth; (2) most of the benefit of short-read polishing occurs by 25× depth; (3) Polypolish-careful almost never introduces false-positive errors at any depth; and (4) Pypolca-careful is the single most effective polisher. Overall, we recommend the following polishing strategies: Polypolish-careful alone when depth is very low (<5×), Polypolish-careful and Pypolca-careful when depth is low (5–25×), and Polypolish-default and Pypolca-careful when depth is sufficient (>25×).
-
-
-
Estimating geographical spread of Streptococcus pneumoniae within Israel using genomic data
More LessUnderstanding how pathogens spread across geographical space is fundamental for control measures such as vaccination. Streptococcus pneumoniae (the pneumococcus) is a respiratory bacterium responsible for a large proportion of infectious disease morbidity and mortality globally. Even in the post-vaccination era, the rates of invasive pneumococcal disease (IPD) remain stable in most countries, including Israel. To understand the geographical spread of the pneumococcus in Israel, we analysed 1174 pneumococcal genomes from patients with IPD across multiple regions. We included the evolutionary distance between pairs of isolates inferred using whole-genome data within a relative risk (RR) ratio framework to capture the geographical structure of S. pneumoniae. While we could not find geographical structure at the overall lineage level, the extra granularity provided by whole-genome sequence data showed that it takes approximately 5 years for invasive pneumococcal isolates to become fully mixed across the country.
This article contains data hosted by Microreact.
-
- Functional Genomics and Microbe–Niche Interactions
-
-
Model-driven characterization of functional diversity of Pseudomonas aeruginosa clinical isolates with broadly representative phenotypes
More LessPseudomonas aeruginosa is a leading cause of infections in immunocompromised individuals and in healthcare settings. This study aims to understand the relationships between phenotypic diversity and the functional metabolic landscape of P. aeruginosa clinical isolates. To better understand the metabolic repertoire of P. aeruginosa in infection, we deeply profiled a representative set from a library of 971 clinical P. aeruginosa isolates with corresponding patient metadata and bacterial phenotypes. The genotypic clustering based on whole-genome sequencing of the isolates, multilocus sequence types, and the phenotypic clustering generated from a multi-parametric analysis were compared to each other to assess the genotype–phenotype correlation. Genome-scale metabolic network reconstructions were developed for each isolate through amendments to an existing PA14 network reconstruction. These network reconstructions show diverse metabolic functionalities and enhance the collective P. aeruginosa pangenome metabolic repertoire. Characterizing this rich set of clinical P. aeruginosa isolates allows for a deeper understanding of the genotypic and metabolic diversity of the pathogen in a clinical setting and lays a foundation for further investigation of the metabolic landscape of this pathogen and host-associated metabolic differences during infection.
-
-
-
Prevalence and diversity of TAL effector-like proteins in fungal endosymbiotic Mycetohabitans spp.
Endofungal Mycetohabitans (formerly Burkholderia) spp. rely on a type III secretion system to deliver mostly unidentified effector proteins when colonizing their host fungus, Rhizopus microsporus. The one known secreted effector family from Mycetohabitans consists of homologues of transcription activator-like (TAL) effectors, which are used by plant pathogenic Xanthomonas and Ralstonia spp. to activate host genes that promote disease. These ‘Burkholderia TAL-like (Btl)’ proteins bind corresponding specific DNA sequences in a predictable manner, but their genomic target(s) and impact on transcription in the fungus are unknown. Recent phenotyping of Btl mutants of two Mycetohabitans strains revealed that the single Btl in one Mycetohabitans endofungorum strain enhances fungal membrane stress tolerance, while others in a Mycetohabitans rhizoxinica strain promote bacterial colonization of the fungus. The phenotypic diversity underscores the need to assess the sequence diversity and, given that sequence diversity translates to DNA targeting specificity, the functional diversity of Btl proteins. Using a dual approach to maximize capture of Btl protein sequences for our analysis, we sequenced and assembled nine Mycetohabitans spp. genomes using long-read PacBio technology and also mined available short-read Illumina fungal–bacterial metagenomes. We show that btl genes are present across diverse Mycetohabitans strains from Mucoromycota fungal hosts yet vary in sequences and predicted DNA binding specificity. Phylogenetic analysis revealed distinct clades of Btl proteins and suggested that Mycetohabitans might contain more species than previously recognized. Within our data set, Btl proteins were more conserved across M. rhizoxinica strains than across M. endofungorum, but there was also evidence of greater overall strain diversity within the latter clade. Overall, the results suggest that Btl proteins contribute to bacterial–fungal symbioses in myriad ways.
-
- Pathogens and Epidemiology
-
-
Plasmid genomic epidemiology of carbapenem-hydrolysing class D β-lactamase (CDHL)-producing Enterobacterales in Canada, 2010−2021
More LessCarbapenems are last-resort antibiotics for treatment of infections caused by multidrug-resistant Enterobacterales, but carbapenem resistance is a rising global threat due to the acquisition of carbapenemase genes. Oxacillinase-48 (bla OXA-48)-type carbapenemases are increasing in abundance in Canada and elsewhere; these genes are frequently found on mobile genetic elements and are associated with specific transposons. This means that alongside clonal dissemination, bla OXA-48-type genes can spread through plasmid-mediated horizontal gene transfer. We applied whole genome sequencing to characterize 249 bla OXA-48-type-producing Enterobacterales isolates collected by the Canadian Nosocomial Infection Surveillance Program from 2010 to 2021. Using a combination of short- and long-read sequencing, we obtained 70 complete and circular bla OXA-48-type-encoding plasmids. Using MOB-suite, four major plasmids clustered were identified, and we further estimated a plasmid cluster for 91.9 % (147/160) of incomplete bla OXA-48-type-encoding contigs. We identified different patterns of carbapenemase mobilization across Canada, including horizontal transmission of bla OXA-181/IncX3 plasmids (75/249, 30.1 %) and bla OXA-48/IncL/M plasmids (47/249, 18.9 %), and both horizontal transmission and clonal transmission of bla OXA-232 for Klebsiella pneumoniae ST231 on ColE2-type/ColKP3 plasmids (25/249, 10.0 %). Our findings highlight the diversity of OXA-48-type plasmids and indicate that multiple plasmid clusters and clonal transmission have contributed to bla OXA-48-type spread and persistence in Canada.
-
-
-
Expansion of pneumococcal serotype 23F and 14 lineages with genotypic changes in capsule polysaccharide locus and virulence gene profiles post introduction of pneumococcal conjugate vaccine in Blantyre, Malawi
Since the introduction of the 13-valent pneumococcal conjugate vaccine (PCV13) in Malawi in 2011, there has been persistent carriage of vaccine serotype (VT) Streptococcus pneumoniae, despite high vaccine coverage. To determine if there has been a genetic change within the VT capsule polysaccharide (cps) loci since the vaccine’s introduction, we compared 1022 whole-genome-sequenced VT isolates from 1998 to 2019. We identified the clonal expansion of a multidrug-resistant, penicillin non-susceptible serotype 23F GPSC14-ST2059 lineage, a serotype 14 GPSC9-ST782 lineage and a novel serotype 14 sequence type GPSC9-ST18728 lineage. Serotype 23F GPSC14-ST2059 had an I253T mutation within the capsule oligosaccharide repeat unit polymerase Wzy protein, which is predicted in silico to alter the protein pocket cavity. Moreover, serotype 23F GPSC14-ST2059 had SNPs in the DNA binding sites for the cps transcriptional repressors CspR and SpxR. Serotype 14 GPSC9-ST782 harbours a non-truncated version of the large repetitive protein (Lrp), containing a Cna protein B-type domain which is also present in proteins associated with infection and colonisation. These emergent lineages also harboured genes associated with antibiotic resistance, and the promotion of colonisation and infection which were absent in other lineages of the same serotype. Together these data suggest that in addition to serotype replacement, modifications of the capsule locus associated with changes in virulence factor expression and antibiotic resistance may promote vaccine escape. In summary, the study highlights that the persistence of vaccine serotype carriage despite high vaccine coverage in Malawi may be partly caused by expansion of VT lineages post-PCV13 rollout.
-
Most Read This Month
