Skip to content
1887

Abstract

Identifying genetic variants associated with bacterial phenotypes, such as virulence, host preference and antimicrobial resistance, has great potential for a better understanding of the mechanisms involved in these traits. The availability of large collections of bacterial genomes has made genome-wide association studies (GWAS) a common approach for this purpose. The need to employ multiple software tools for data pre- and postprocessing limits the application of these methods by experienced bioinformaticians. To address this issue, we have developed a pipeline to perform bacterial GWAS from a set of assemblies and annotations, with multiple phenotypes as targets. The associations are run using five sets of genetic variants: unitigs, gene presence/absence, rare variants (i.e. gene burden test), gene-cluster-specific -mers and all unitigs jointly. All variants passing the association threshold are further annotated to identify overrepresented biological processes and pathways. The results can be further augmented by generating a phylogenetic tree and predicting the presence of antimicrobial resistance and virulence-associated genes. We tested the microGWAS pipeline on a previously reported dataset on virulence, successfully identifying the causal variants and providing further interpretation of the association results. The microGWAS pipeline integrates state-of-the-art tools to perform bacterial GWAS into a single, user-friendly and reproducible pipeline, allowing for the democratization of these analyses. The pipeline, together with its documentation, can be accessed at https://github.com/microbial-pangenomes-lab/microGWAS.

Funding
This study was supported by the:
  • German Academic Exchange Service
    • Principal Award Recipient: BamuF Damaris
  • Deutsche Forschungsgemeinschaft (Award 390874280)
    • Principal Award Recipient: MarcoGalardini
  • Deutsche Forschungsgemeinschaft (Award 390874280)
    • Principal Award Recipient: BamuF Damaris
  • Deutsche Forschungsgemeinschaft (Award 390874280)
    • Principal Award Recipient: JuditBurgaya
  • This is an open-access article distributed under the terms of the Creative Commons Attribution License.
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.001349
2025-02-11
2025-12-17

Metrics

Loading full text...

Full text loading...

/deliver/fulltext/mgen/11/2/mgen001349.html?itemId=/content/journal/mgen/10.1099/mgen.0.001349&mimeType=html&fmt=ahah

References

  1. Falush D. Bacterial genomics: microbial GWAS coming of age. Nat Microbiol 2016; 1:16059 [View Article] [PubMed]
    [Google Scholar]
  2. Earle SG, Lobanovska M, Lavender H, Tang C, Exley RM et al. Genome-wide association studies reveal the role of polymorphisms affecting factor H binding protein expression in host invasion by Neisseria meningitidis. PLoS Pathog 2021; 17:e1009992 [View Article] [PubMed]
    [Google Scholar]
  3. Galardini M, Clermont O, Baron A, Busby B, Dion S et al. Major role of iron uptake systems in the intrinsic extra-intestinal virulence of the genus Escherichia revealed by a genome-wide association study. PLoS Genet 2020; 16:e1009065 [View Article] [PubMed]
    [Google Scholar]
  4. Burgaya J, Marin J, Royer G, Condamine B, Gachet B et al. The bacterial genetic determinants of Escherichia coli capacity to cause bloodstream infections in humans. PLoS Genet 2023; 19:e1010842 [View Article] [PubMed]
    [Google Scholar]
  5. Alam MT, Petit RA 3rd, Crispell EK, Thornton TA, Conneely KN et al. Dissecting vancomycin-intermediate resistance in staphylococcus aureus using genome-wide association. Genome Biol Evol 2014; 6:1174–1185 [View Article] [PubMed]
    [Google Scholar]
  6. Sheppard SK, Didelot X, Meric G, Torralbo A, Jolley KA et al. Genome-wide association study identifies vitamin B5 biosynthesis as a host specificity factor in Campylobacter. Proc Natl Acad Sci U S A 2013; 110:11923–11927 [View Article] [PubMed]
    [Google Scholar]
  7. Medini D, Donati C, Tettelin H, Masignani V, Rappuoli R. The microbial pan-genome. Curr Opin Genet Dev 2005; 15:589–594 [View Article] [PubMed]
    [Google Scholar]
  8. Power RA, Parkhill J, de Oliveira T. Microbial genome-wide association studies: lessons from human GWAS. Nat Rev Genet 2017; 18:41–50 [View Article] [PubMed]
    [Google Scholar]
  9. Roux de Bézieux H, Lima L, Perraudeau F, Mary A, Dudoit S et al. CALDERA: finding all significant de Bruijn subgraphs for bacterial GWAS. Bioinformatics 2022; 38:i36–i44 [View Article]
    [Google Scholar]
  10. Jaillard M, Lima L, Tournoud M, Mahé P, van Belkum A et al. A fast and agnostic method for bacterial genome-wide association studies: Bridging the gap between k-mers and genetic events. PLoS Genet 2018; 14:e1007758 [View Article] [PubMed]
    [Google Scholar]
  11. Wu J, Pipistrelle J. bacterialGWAS [Software] 2015 https://github.com/jessiewu/bacterialGWAS
  12. Coll F, Gouliouris T, Bruchmann S, Phelan J, Raven KE et al. PowerBacGWAS: a computational pipeline to perform power calculations for bacterial genome-wide association studies. Commun Biol 2022; 5:266 [View Article] [PubMed]
    [Google Scholar]
  13. Köster J, Rahmann S. Snakemake-a scalable bioinformatics workflow engine. Bioinformatics 2018; 34:3600 [View Article] [PubMed]
    [Google Scholar]
  14. Argimón S, Abudahab K, Goater RJE, Fedosejev A, Bhai J et al. Microreact: visualizing and sharing data for genomic epidemiology and phylogeography. Microbial Genomics 2016; 2: [View Article]
    [Google Scholar]
  15. Blin K. ncbi-genome-download; 2023 https://doi.org/10.5281/ZENODO.8192486
  16. Jolley KA, Bray JE, Maiden MCJ. Open-access bacterial population genomics: BIGSdb software, the PubMLST.org website and their applications. Wellcome Open Res 2018; 3:124 [View Article] [PubMed]
    [Google Scholar]
  17. Seemann T. mlst [Software] 2014 https://github.com/tseemann/mlst
  18. Sherry NL, Horan KA, Ballard SA, Gonҫalves da Silva A, Gorrie CL et al. An ISO-certified genomics workflow for identification and surveillance of antimicrobial resistance. Nat Commun 2023; 14:60 [View Article] [PubMed]
    [Google Scholar]
  19. Tonkin-Hill G, MacAlasdair N, Ruis C, Weimann A, Horesh G et al. Producing polished prokaryotic pangenomes with the Panaroo pipeline. Genome Biol 2020; 21:180 [View Article] [PubMed]
    [Google Scholar]
  20. Sommer H, Djamalova D, Galardini M. Reduced ambiguity and improved interpretability of bacterial genome-wide associations using gene-cluster-centric k-mers. Microbial Genomics 2023; 9: [View Article]
    [Google Scholar]
  21. Seemann T. Snippy: fast bacterial variant calling from NGS reads [Software] 2015 https://github.com/tseemann/snippy
  22. Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V et al. Twelve years of SAMtools and BCFtools. Gigascience 2021; 10:giab008 [View Article] [PubMed]
    [Google Scholar]
  23. Dunham AS, Beltrao P, AlQuraishi M. High-throughput deep learning variant effect prediction with Sequence UNET. Genome Biol 2023; 24:110 [View Article] [PubMed]
    [Google Scholar]
  24. Lippert C, Casale FP, Rakitsch B, Stegle O. LIMIX: genetic analysis of multiple traits; 2014 [View Article]
  25. Schweiger R, Kaufman S, Laaksonen R, Kleber ME, März W et al. Fast and accurate construction of confidence intervals for heritability. Am J Hum Genet 2016; 98:1181–1192 [View Article] [PubMed]
    [Google Scholar]
  26. Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH et al. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol 2016; 17:132 [View Article] [PubMed]
    [Google Scholar]
  27. Lees JA, Galardini M, Bentley SD, Weiser JN, Corander J. pyseer: a comprehensive tool for microbial pangenome-wide association studies. Bioinformatics 2018; 34:4310–4312 [View Article] [PubMed]
    [Google Scholar]
  28. Page AJ, Taylor B, Delaney AJ, Soares J, Seemann T et al. SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments. Microbial Genomics 2016; 2: [View Article]
    [Google Scholar]
  29. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM; 2013 http://arxiv.org/abs/1303.3997
  30. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 2010; 26:841–842 [View Article] [PubMed]
    [Google Scholar]
  31. Dale RK, Pedersen BS, Quinlan AR. Pybedtools: a flexible python library for manipulating genomic datasets and annotations. Bioinformatics 2011; 27:3423–3424 [View Article] [PubMed]
    [Google Scholar]
  32. Huerta-Cepas J, Forslund K, Coelho LP, Szklarczyk D, Jensen LJ et al. Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper. Mol Biol Evol 2017; 34:2115–2122 [View Article] [PubMed]
    [Google Scholar]
  33. Tatusov RL, Galperin MY, Natale DA, Koonin EV. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res 2000; 28:33–36 [View Article] [PubMed]
    [Google Scholar]
  34. Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 2000; 28:27–30 [View Article] [PubMed]
    [Google Scholar]
  35. Kanehisa M. Toward understanding the origin and evolution of cellular organisms. Protein Sci 2019; 28:1947–1951 [View Article] [PubMed]
    [Google Scholar]
  36. Kanehisa M, Furumichi M, Sato Y, Kawashima M, Ishiguro-Watanabe M. KEGG for taxonomy-based analysis of pathways and genomes. Nucleic Acids Res 2023; 51:D587–D592 [View Article] [PubMed]
    [Google Scholar]
  37. Seabold S, Perktold J. Statsmodels: econometric and statistical modeling with Python. In 9th Python in Science ConferenceAustin, Texas 2010 pp 57–61
    [Google Scholar]
  38. Klopfenstein DV, Zhang L, Pedersen BS, Ramírez F, Warwick Vesztrocy A et al. GOATOOLS: a Python library for gene ontology analyses. Sci Rep 2018; 8:10872 [View Article] [PubMed]
    [Google Scholar]
  39. Royer G, Clermont O, Marin J, Condamine B, Dion S et al. Epistatic interactions between the high pathogenicity island and other iron uptake systems shape Escherichia coli extra-intestinal virulence. Nat Commun 2023; 14:3667 [View Article] [PubMed]
    [Google Scholar]
  40. Nieto JM, Carmona M, Bolland S, Jubete Y, de la Cruz F et al. The hha gene modulates haemolysin expression in Escherichia coli. Mol Microbiol 1991; 5:1285–1293 [View Article] [PubMed]
    [Google Scholar]
  41. Gardner AM, Helmick RA, Gardner PR. Flavorubredoxin, an inducible catalyst for nitric oxide reduction and detoxification in Escherichia coli. J Biol Chem 2002; 277:8172–8177 [View Article] [PubMed]
    [Google Scholar]
  42. Harrison OB, Claus H, Jiang Y, Bennett JS, Bratcher HB et al. Description and nomenclature of Neisseria meningitidis capsule locus. Emerg Infect Dis 2013; 19:566–573 [View Article] [PubMed]
    [Google Scholar]
/content/journal/mgen/10.1099/mgen.0.001349
Loading
/content/journal/mgen/10.1099/mgen.0.001349
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An error occurred
Approval was partially successful, following selected items could not be processed due to error