1887

Abstract

Homoplasic SNPs are considered important signatures of strong (positive) selective pressure, and hence of adaptive evolution for clinically relevant traits such as antibiotic resistance and virulence. Here we present a new tool, SNPPar, for efficient detection and analysis of homoplasic SNPs from large whole genome sequencing datasets (>1000 isolates and/or >100 000 SNPs). SNPPar takes as input an SNP alignment, tree and annotated reference genome, and uses a combination of simple monophyly tests and ancestral state reconstruction (ASR, via TreeTime) to assign mutation events to branches and identify homoplasies. Mutations are annotated at the level of codon and gene, to facilitate analysis of convergent evolution. Testing on simulated data (120 alignments representing local and global samples) showed SNPPar can detect homoplasic SNPs with very high specificity (zero false-positives in all tests) and high sensitivity (zero false-negatives in 89 % of tests). SNPPar analysis of three empirically sampled datasets (, and ) produced results that were in concordance with previous studies, in terms of both individual homoplasies and evidence of convergence at the codon and gene levels. SNPPar analysis of a simulated alignment of ~64 000 genome-wide SNPs from 2000 genomes took ~23 min and ~2.6 GB of RAM to generate complete annotated results on a laptop. This analysis required ASR be conducted for only 1.25 % of SNPs, and the ASR step took ~23 s and 0.4 GB of RAM. SNPPar automates the detection and annotation of homoplasic SNPs efficiently and accurately from large SNP alignments. As demonstrated by the examples included here, this information can be readily used to explore the role of homoplasy in parallel and/or convergent evolution at the level of nucleotide, codon and/or gene.

Keyword(s): bacteria , evolution , homoplasy and phylogeny
Funding
This study was supported by the:
  • department of health and human services, state government of victoria
    • Principle Award Recipient: BernardPope
  • australian research council (Award DE190100805)
    • Principle Award Recipient: SebastianDuchene
  • sylvia and charles viertel charitable foundation
    • Principle Award Recipient: KathrynE Holt
  • bill and melinda gates foundation (Award OPP1175797)
    • Principle Award Recipient: KathrynE Holt
  • This is an open-access article distributed under the terms of the Creative Commons Attribution License. This article was made open access via a Publish and Read agreement between the Microbiology Society and the corresponding author’s institution.
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.000694
2021-12-07
2022-01-28
Loading full text...

Full text loading...

/deliver/fulltext/mgen/7/12/mgen000694.html?itemId=/content/journal/mgen/10.1099/mgen.0.000694&mimeType=html&fmt=ahah

References

  1. Stern DL. The genetic causes of convergent evolution. Nat Rev Genet 2013; 14:751–764 [View Article] [PubMed]
    [Google Scholar]
  2. Bailey SF, Blanquart F, Bataillon T, Kassen R. What drives parallel evolution?. Bioessays 2017; 39:e201600176 [View Article]
    [Google Scholar]
  3. Bryant J, Chewapreecha C, Bentley SD. Developing insights into the mechanisms of evolution of bacterial pathogens from whole-genome sequences. Future Microbiol 2012; 7:1283–1296 [View Article] [PubMed]
    [Google Scholar]
  4. Didelot X, Walker AS, Peto TE, Crook DW, Wilson DJ. Within-host evolution of bacterial pathogens. Nat Rev Microbiol 2016; 14:150–162 [View Article] [PubMed]
    [Google Scholar]
  5. Sanderson MJ, Hufford L. Homoplasy: The Recurrence of Similarity in Evolution Academic Press; 1996
    [Google Scholar]
  6. Kimura M, Crow JF. The number of alleles that can be maintained in a finite population. Genetics 1964; 49:725–738 [View Article] [PubMed]
    [Google Scholar]
  7. Croucher NJ, Page AJ, Connor TR, Delaney AJ, Keane JA et al. Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. Nucleic Acids Res 2015; 43:e15 [View Article] [PubMed]
    [Google Scholar]
  8. Didelot X, Wilson DJ. ClonalFrameML: efficient inference of recombination in whole bacterial genomes. PLoS Comput Biol 2015; 11:e1004041 [View Article] [PubMed]
    [Google Scholar]
  9. Hedge J, Wilson DJ. Bacterial phylogenetic reconstruction from whole genomes is robust to recombination but demographic inference is not. mBio 2014; 5:e02158 [View Article] [PubMed]
    [Google Scholar]
  10. Hedge J, Wilson DJ. Practical approaches for detecting selection in microbial genomes. PLoS Comput Biol 2016; 12:12 [View Article] [PubMed]
    [Google Scholar]
  11. Pupko T, Pe’er I, Shamir R, Graur D. A fast algorithm for joint reconstruction of ancestral amino acid sequences. Mol Biol Evol 2000; 17:890–896 [View Article] [PubMed]
    [Google Scholar]
  12. Pupko T, Pe’er I, Hasegawa M, Graur D, Friedman N. A branch-and-bound algorithm for the inference of ancestral amino-acid sequences when the replacement rate varies among sites: application to the evolution of five gene families. Bioinformatics 2002; 18:1116–1123 [View Article] [PubMed]
    [Google Scholar]
  13. Ashkenazy H, Penn O, Doron-Faigenboim A, Cohen O, Cannarozzi G et al. FastML: a web server for probabilistic reconstruction of ancestral sequences. Nucleic Acids Res 2012; 40:W580–4 [View Article] [PubMed]
    [Google Scholar]
  14. Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 2007; 24:1586–1591 [View Article] [PubMed]
    [Google Scholar]
  15. Sagulenko P, Puller V, Neher RA. TreeTime: Maximum-likelihood phylodynamic analysis. Virus Evol 2018; 4:vex042 [View Article] [PubMed]
    [Google Scholar]
  16. Crispell J, Balaz D, Gordon SV. HomoplasyFinder: a simple tool to identify homoplasies on a phylogeny. Microb Genom 2019; 5:e000245 [View Article] [PubMed]
    [Google Scholar]
  17. Jeanes C, O’Grady J. Diagnosing tuberculosis in the 21st century - Dawn of a genomics revolution?. Int J Mycobacteriol 2016; 5:384–391 [View Article] [PubMed]
    [Google Scholar]
  18. Votintseva AA, Bradley P, Pankhurst L, Del Ojo Elias C, Loose M et al. Same-day diagnostic and surveillance data for tuberculosis via whole-genome sequencing of direct respiratory samples. J Clin Microbiol 2017; 55:1285–1298 [View Article] [PubMed]
    [Google Scholar]
  19. Brown E, Dessai U, McGarry S, Gerner-Smidt P. Use of whole-genome sequencing for food safety and public health in the United States. Foodborne Pathog Dis 2019; 16:441–450 [View Article] [PubMed]
    [Google Scholar]
  20. Zwickl DJ, Hillis DM. creased taxon sampling greatly reduces phylogenetic error. Syst Biol 2002; 51:588–598 [View Article] [PubMed]
    [Google Scholar]
  21. Baele G, Suchard MA, Rambaut A, Lemey P. Emerging concepts of data integration in pathogen phylodynamics. Syst Biol 2017; 66:e47–e65 [View Article] [PubMed]
    [Google Scholar]
  22. Baele G, Dellicour S, Suchard MA, Lemey P, Vrancken B. Recent advances in computational phylodynamics. Curr Opin Virol 2018; 31:24–32 [View Article] [PubMed]
    [Google Scholar]
  23. Lees JA, Kendall M, Parkhill J, Colijn C, Bentley SD et al. Evaluation of phylogenetic reconstruction methods using bacterial whole genomes: a simulation based study. Wellcome Open Res 2018; 3:33 [View Article] [PubMed]
    [Google Scholar]
  24. Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 2014; 30:1312–1313 [View Article] [PubMed]
    [Google Scholar]
  25. Kozlov AM, Darriba D, Flouri T, Morel B, Stamatakis A. RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference. Bioinformatics 2019; 35:4453–4455 [View Article] [PubMed]
    [Google Scholar]
  26. Nguyen LT, Schmidt HA, von Haeseler A, Minh BQ. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol 2015; 32:268–274 [View Article] [PubMed]
    [Google Scholar]
  27. Minh BQ, Schmidt H, Chernomor O, Schrempf D, Woodhams M et al. IQ-TREE 2: New models and efficient methods for phylogenetic inference in the genomic era [Preprint]. bioRxiv 2019849372 [View Article]
    [Google Scholar]
  28. Huerta-Cepas J, Serra F, Bork P. ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data. Mol Biol Evol 2016; 33:1635–1638 [View Article] [PubMed]
    [Google Scholar]
  29. Letunic I, Bork P. Interactive Tree Of Life (iTOL) v4: recent updates and new developments. Nucleic Acids Res 2019; 47:W256–W259 [View Article] [PubMed]
    [Google Scholar]
  30. Yu G, Smith DK, Zhu H, Guan Y, Lam TT-Y. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol Evol 2016; 828–36: [View Article]
    [Google Scholar]
  31. Holt KE, McAdam P, Thai PVK, Thuong NTT, Ha DTM et al. Frequent transmission of the Mycobacterium tuberculosis Beijing lineage and positive selection for the EsxW Beijing variant in Vietnam. Nat Genet 2018; 50:849–856 [View Article] [PubMed]
    [Google Scholar]
  32. Rambaut A, Grass NC. Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Bioinformatics 1997; 13:235–238 [View Article]
    [Google Scholar]
  33. Perrin A, Larsonneur E, Nicholson AC, Edwards DJ, Gundlach KM et al. Evolutionary dynamics and genomic features of the Elizabethkingia anophelis 2015 to 2016 Wisconsin outbreak strain. Nat Commun 2017; 8:15483 [View Article] [PubMed]
    [Google Scholar]
  34. Lieberman TD, Michel JB, Aingaran M, Potter-Bynoe G, Roux D et al. Parallel bacterial evolution within multiple patients identifies candidate pathogenicity genes. Nat Genet 2011; 43:1275–1280 [View Article] [PubMed]
    [Google Scholar]
  35. Farhat MR, Shapiro BJ, Kieser KJ, Sultana R, Jacobson KR et al. Genomic analysis identifies targets of convergent positive selection in drug-resistant Mycobacterium tuberculosis. Nat Genet 2013; 45:1183–1189 [View Article] [PubMed]
    [Google Scholar]
  36. Coll F, Phelan J, Hill-Cawthorne GA, Nair MB, Mallard K et al. Genome-wide analysis of multi- and extensively drug-resistant Mycobacterium tuberculosis. Nat Genet 2018; 50:307–316 [View Article] [PubMed]
    [Google Scholar]
  37. Nair J, Rouse DA, Bai GH, Morris SL. The rpsL gene and streptomycin resistance in single and multiple drug-resistant strains of Mycobacterium tuberculosis. Mol Microbiol 1993; 10:521–527 [View Article] [PubMed]
    [Google Scholar]
  38. Cade CE, Dlouhy AC, Medzihradszky KF, Salas-Castillo SP, Ghiladi RA. Isoniazid-resistance conferring mutations in Mycobacterium tuberculosis katG: catalase, peroxidase, and INH-NADH adduct formation activities. Protein Sci 2010; 19:458–474 [View Article] [PubMed]
    [Google Scholar]
  39. Juréen P, Werngren J, Toro JC, Hoffner S. Pyrazinamide resistance and pncA gene mutations in Mycobacterium tuberculosis. Antimicrob Agents Chemother 2008; 52:1852–1854 [View Article] [PubMed]
    [Google Scholar]
  40. Miller LP, Crawford JT, Shinnick TM. The rpoB gene of Mycobacterium tuberculosis. Antimicrob Agents Chemother 1994; 38:805–811 [View Article] [PubMed]
    [Google Scholar]
  41. Bakuła Z, Napiórkowska A, Bielecki J, Augustynowicz-Kopeć E, Zwolska Z et al. Mutations in the embB gene and their association with ethambutol resistance in multidrug-resistant Mycobacterium tuberculosis clinical isolates from Poland. Biomed Res Int 2013; 2013:167954 [View Article] [PubMed]
    [Google Scholar]
  42. Hegde SR, Rajasingh H, Das C, Mande SS, Mande SC. Understanding communication signals during mycobacterial latency through predicted genome-wide protein interactions and Boolean modeling. PLoS One 2012; 7:e33893 [View Article] [PubMed]
    [Google Scholar]
  43. Joy JB, Liang RH, McCloskey RM, Nguyen T, Poon AFY. Ancestral reconstruction. PLoS Comput Biol 2016; 12:e1004763 [View Article] [PubMed]
    [Google Scholar]
  44. Duchêne S, Lanfear R. Phylogenetic uncertainty can bias the number of evolutionary transitions estimated from ancestral state reconstruction methods. J Exp Zool B Mol Dev Evol 2015; 324:517–524 [View Article] [PubMed]
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.000694
Loading
/content/journal/mgen/10.1099/mgen.0.000694
Loading

Data & Media loading...

Supplements

Supplementary material 1

PDF

Most cited this month Most Cited RSS feed

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error