1887

Abstract

Bacterial genome-wide association studies (bGWAS) capture associations between genomic variation and phenotypic variation. Convergence-based bGWAS methods identify genomic mutations that occur independently multiple times on the phylogenetic tree in the presence of phenotypic variation more often than is expected by chance. This work introduces hogwash, an open source R package that implements three algorithms for convergence-based bGWAS. Hogwash additionally contains two burden testing approaches to perform gene or pathway analysis to improve power and increase convergence detection for related but weakly penetrant genotypes. To identify optimal use cases, we applied hogwash to data simulated with a variety of phylogenetic signals and convergence distributions. These simulated data are publicly available and contain the relevant metadata regarding convergence and phylogenetic signal for each phenotype and genotype. Hogwash is available for download from GitHub.

Funding
This study was supported by the:
  • National Institute of Allergy and Infectious Diseases (Award 1U01Al124255)
    • Principle Award Recipient: Evan S. Snitkin
  • National Institutes of Health (Award T32GM007544)
    • Principle Award Recipient: Katie Saund
  • This is an open-access article distributed under the terms of the Creative Commons Attribution License.
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.000469
2020-11-18
2024-04-18
Loading full text...

Full text loading...

/deliver/fulltext/mgen/6/11/mgen000469.html?itemId=/content/journal/mgen/10.1099/mgen.0.000469&mimeType=html&fmt=ahah

References

  1. Farhat MR, Shapiro BJ, Kieser KJ, Sultana R, Jacobson KR et al. Genomic analysis identifies targets of convergent positive selection in drug-resistant Mycobacterium tuberculosis. Nat Genet 2013; 45:1183–1189 [View Article][PubMed]
    [Google Scholar]
  2. Sheppard SK, Didelot X, Meric G, Torralbo A, Jolley KA et al. Genome-Wide association study identifies vitamin b5 biosynthesis as a host specificity factor in Campylobacter. Proc Natl Acad Sci U S A 2013; 110:11923–11927 [View Article][PubMed]
    [Google Scholar]
  3. Sveinbjornsson G, Albrechtsen A, Zink F, Gudjonsson SA, Oddson A et al. Weighting sequence variants based on their annotation increases power of whole-genome association studies. Nat Genet 2016; 48:314–317 [View Article][PubMed]
    [Google Scholar]
  4. Hendricks AE, Bochukova EG, Marenne G, Keogh JM, Atanassova N et al. Rare variant analysis of human and rodent obesity genes in individuals with severe childhood obesity. Sci Rep 2017; 7:1–14 [View Article]
    [Google Scholar]
  5. Power RA, Parkhill J, de Oliveira T. Microbial genome-wide association studies: lessons from human GWAS. Nat Rev Genet 2017; 18:41-50 [View Article][PubMed]
    [Google Scholar]
  6. Brynildsrud O, Bohlin J, Scheffer L, Eldholm V. Rapid scoring of genes in microbial pan-genome-wide association studies with Scoary. Genome Biol. 2016; 17:
    [Google Scholar]
  7. Lees JA, Galardini M, Bentley SD, Weiser JN, Corander J. pyseer: a comprehensive tool for microbial pangenome-wide association studies. Bioinformatics 2018; 34:4310–4312 [View Article][PubMed]
    [Google Scholar]
  8. Earle SG, Wu C-H, Charlesworth J, Stoesser N, Gordon NC et al. Identifying lineage effects when controlling for population structure improves power in bacterial association studies. Nat Microbiol 2016; 1:16041 [View Article][PubMed]
    [Google Scholar]
  9. Collins C, Didelot X. A phylogenetic method to perform genome-wide association studies in microbes that accounts for population structure and recombination. PLoS Comput Biol 2018; 14:e1005958 [View Article][PubMed]
    [Google Scholar]
  10. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 2007; 81:559–575 [View Article][PubMed]
    [Google Scholar]
  11. Chen PE, Shapiro BJ. The advent of genome-wide association studies for bacteria. Current Opinion in Microbiology 25 2015 pp 17–24 [View Article]
    [Google Scholar]
  12. Corander J, Croucher NJ, Harris SR, Lees JA, Tonkin‐Hill G. Bacterial Population Genomics. Handbook of Statistical Genomics Wiley: 2019 pp 997–1020
    [Google Scholar]
  13. Lee S, Abecasis GR, Boehnke M, Lin X. Rare-variant association analysis: study designs and statistical tests. Am J Hum Genet 2014; 95:5–23 [View Article][PubMed]
    [Google Scholar]
  14. Li B, Leal SM. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet 2008; 83:311–321 [View Article][PubMed]
    [Google Scholar]
  15. Mooney MA, Wilmot B. Gene set analysis: a step-by-step guide. Am J Med Genet B Neuropsychiatr Genet 2015; 168:517–527 [View Article][PubMed]
    [Google Scholar]
  16. White MJ, Yaspan BL, Veatch OJ, Goddard P, Risse-Adams OS et al. Strategies for pathway analysis using GWAS and WGS data. Curr Protoc Hum Genet 2019; 100:e79 [View Article][PubMed]
    [Google Scholar]
  17. Saund K, Lapp Z, Thiede SN, Pirani A, Snitkin ES. prewas: data pre-processing for more informative bacterial GWAS. Microb Genom 2020; 6: [View Article][PubMed]
    [Google Scholar]
  18. Van Assche A, Álvarez-Pérez S, de Breij A, De Brabanter J, Willems KA et al. Phylogenetic signal in phenotypic traits related to carbon source assimilation and chemical sensitivity in Acinetobacter species. Appl Microbiol Biotechnol 2017; 101:367–379 [View Article][PubMed]
    [Google Scholar]
  19. Pagel M. Inferring the historical patterns of biological evolution. Nature 1999; 401:877–884 [View Article][PubMed]
    [Google Scholar]
  20. Fritz SA, Purvis A. Selectivity in mammalian extinction risk and threat types: a new measure of phylogenetic signal strength in binary traits. Conserv Biol 2010; 24:1042–1051 [View Article][PubMed]
    [Google Scholar]
  21. Paradis E, Schliep K. Phylogenetics ape 5.0: An Environment for Modern Phylogenetics and Evolutionary Analyses in R 35 2019 pp 526–528 [View Article]
    [Google Scholar]
  22. R Core Team R: A Language and Environment for Statistical Computing Vienna, Austria: R Foundation for Statistical Computing; 2018
    [Google Scholar]
  23. Orme D. The caper package : comparative analysis of phylogenetics and evolution in R. R Packag version 05, 2; 20131–36
  24. Revell LJ. phytools: an R package for phylogenetic comparative biology (and other things). Methods Ecol Evol 2012; 3:217–223 [View Article]
    [Google Scholar]
  25. Wickham H. tidyverse: Easily Install and Load the “Tidyverse.”; 2017
  26. Wickham H, Seidel D. Scales: scale functions for visualization; 2019
  27. Auguie B. gridExtra: Miscellaneous Functions for “Grid”. Graphics 2017
    [Google Scholar]
  28. Anaconda Data science technology forgroundbreaking research.a competitive edge.a better world.human sensemaking. [cited 2020 Feb 21]. Available from: https://www.anaconda.com/ .
  29. Saber MM, Shapiro BJ. Benchmarking bacterial genome-wide association study methods using simulated genomes and phenotypes. Microb Genom 2020; 6: [View Article][PubMed]
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.000469
Loading
/content/journal/mgen/10.1099/mgen.0.000469
Loading

Data & Media loading...

Supplements

Supplementary material 1

PDF
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error