1887

Abstract

Genomic analyses are widely applied to epidemiological, population genetic and experimental studies of pathogenic fungi. A wide range of methods are employed to carry out these analyses, typically without including controls that gauge the accuracy of variant prediction. The importance of tracking outbreaks at a global scale has raised the urgency of establishing high-accuracy pipelines that generate consistent results between research groups. To evaluate currently employed methods for whole-genome variant detection and elaborate best practices for fungal pathogens, we compared how 14 independent variant calling pipelines performed across 35 isolates from 4 distinct clades and evaluated the performance of variant calling, single-nucleotide polymorphism (SNP) counts and phylogenetic inference results. Although these pipelines used different variant callers and filtering criteria, we found high overall agreement of SNPs from each pipeline. This concordance correlated with site quality, as SNPs discovered by a few pipelines tended to show lower mapping quality scores and depth of coverage than those recovered by all pipelines. We observed that the major differences between pipelines were due to variation in read trimming strategies, SNP calling methods and parameters, and downstream filtration criteria. We calculated specificity and sensitivity for each pipeline by aligning three isolates with chromosomal level assemblies and found that the GATK-based pipelines were well balanced between these metrics. Selection of trimming methods had a greater impact on SAMtools-based pipelines than those using GATK. Phylogenetic trees inferred by each pipeline showed high consistency at the clade level, but there was more variability between isolates from a single outbreak, with pipelines that used more stringent cutoffs having lower resolution. This project generated two truth datasets useful for routine benchmarking of variant calling, a consensus VCF of genotypes discovered by 10 or more pipelines across these 35 diverse isolates and variants for 2 samples identified from whole-genome alignments. This study provides a foundation for evaluating SNP calling pipelines and developing best practices for future fungal genomic studies.

Funding
This study was supported by the:
  • Commissariat Général à l'Investissement (Award ANR10-LABX-62-IBEID)
    • Principle Award Recipient: Christophed’Enfert
  • Division of Intramural Research, National Institute of Allergy and Infectious Diseases (Award U19AI110818)
    • Principle Award Recipient: ChristinaA. Cuomo
  • This is an open-access article distributed under the terms of the Creative Commons Attribution License.
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.000979
2023-04-12
2024-12-14
Loading full text...

Full text loading...

/deliver/fulltext/mgen/9/4/mgen000979.html?itemId=/content/journal/mgen/10.1099/mgen.0.000979&mimeType=html&fmt=ahah

References

  1. Tsay S, Welsh RM, Adams EH, Chow NA, Gade L et al. Notes from the field: ongoing transmission of Candida auris in health care facilities - United States, June 2016-May 2017. MMWR Morb Mortal Wkly Rep 2017; 66:514–515 [View Article]
    [Google Scholar]
  2. Desjardins CA, Giamberardino C, Sykes SM, Yu C-H, Tenor JL et al. Population genomics and the evolution of virulence in the fungal pathogen Cryptococcus neoformans. Genome Res 2017; 27:1207–1219 [View Article] [PubMed]
    [Google Scholar]
  3. Chow NA, Muñoz JF, Gade L, Berkow EL, Li X et al. Tracing the evolutionary history and global expansion of Candida auris using population genomic analyses. mBio 2020; 11:e03364-19 [View Article]
    [Google Scholar]
  4. Ropars J, Maufrais C, Diogo D, Marcet-Houben M, Perin A et al. Gene flow contributes to diversification of the major fungal pathogen Candida albicans. Nat Commun 2018; 9:2253 [View Article]
    [Google Scholar]
  5. Lockhart SR, Etienne KA, Vallabhaneni S, Farooqi J, Chowdhary A et al. Simultaneous emergence of multidrug-resistant Candida auris on 3 continents confirmed by whole-genome sequencing and epidemiological analyses. Clin Infect Dis 2017; 64:134–140 [View Article]
    [Google Scholar]
  6. O’Hanlon SJ, Rieux A, Farrer RA, Rosa GM, Waldman B et al. Recent Asian origin of chytrid fungi causing global amphibian declines. Science 2018; 360:621–627 [View Article] [PubMed]
    [Google Scholar]
  7. Islam MT, Croll D, Gladieux P, Soanes DM, Persoons A et al. Emergence of wheat blast in Bangladesh was caused by a South American lineage of Magnaporthe oryzae. BMC Biol 2016; 14:84 [View Article]
    [Google Scholar]
  8. Nielsen R, Paul JS, Albrechtsen A, Song YS. Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet 2011; 12:443–451 [View Article] [PubMed]
    [Google Scholar]
  9. Olson ND, Lund SP, Colman RE, Foster JT, Sahl JW et al. Best practices for evaluating single nucleotide variant calling methods for microbial genomics. Front Genet 2015; 6:235 [View Article] [PubMed]
    [Google Scholar]
  10. Cuomo CA. Harnessing whole genome sequencing in medical mycology. Curr Fungal Infect Rep 2017; 11:52–59 [View Article]
    [Google Scholar]
  11. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 2010; 20:1297–1303 [View Article]
    [Google Scholar]
  12. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J et al. The sequence alignment/map format and SAMtools. Bioinformatics 2009; 25:2078–2079 [View Article]
    [Google Scholar]
  13. Krusche P, Trigg L, Boutros PC, Mason CE, De La Vega FM et al. Best practices for benchmarking germline small-variant calls in human genomes. Nat Biotechnol 2019; 37:555–560 [View Article] [PubMed]
    [Google Scholar]
  14. Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, Del Angel G et al. From FastQ data to high confidence variant calls: the genome analysis toolkit best practices pipeline. Curr Protoc Bioinform 2013; 43:11 [View Article]
    [Google Scholar]
  15. Chow NA, Gade L, Tsay SV, Forsberg K, Greenko JA et al. Multiple introductions and subsequent transmission of multidrug-resistant Candida auris in the USA: a molecular epidemiological survey. Lancet Infect Dis 2018; 18:1377–1384 [View Article]
    [Google Scholar]
  16. Eyre DW, Sheppard AE, Madder H, Moir I, Moroney R et al. A Candida auris outbreak and its control in an intensive care setting. N Engl J Med 2018; 379:1322–1331 [View Article]
    [Google Scholar]
  17. Rhodes J, Abdolrasouli A, Farrer RA, Cuomo CA, Aanensen DM et al. Author correction: genomic epidemiology of the UK outbreak of the emerging human fungal pathogen Candida auris. Emerg Microbes Infect 2018; 7:104 [View Article]
    [Google Scholar]
  18. Muñoz JF, Gade L, Chow NA, Loparev VN, Juieng P et al. Genomic insights into multidrug-resistance, mating and virulence in Candida auris and related emerging species. Nat Commun 2018; 9:5346 [View Article]
    [Google Scholar]
  19. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for illumina sequence data. Bioinformatics 2014; 30:2114–2120 [View Article]
    [Google Scholar]
  20. Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J 2011; 17:10 [View Article]
    [Google Scholar]
  21. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. ArXiv13033997 Q-Bio; 2013 http://arxiv.org/abs/1303.3997 accessed 22 May 2015
  22. Poplin R, Ruano-Rubio V, DePristo MA, Fennell TJ, Carneiro MO et al. Scaling accurate genetic variant discovery to tens of thousands of samples. bioRxiv201178 2018 [View Article]
    [Google Scholar]
  23. Felsenstein J. PHYLIP (phylogeny inference package) version 3.6.3. Available via the web. n.d http://evolution.genetics.washington.edu/phylip.html
  24. Kumar S, Stecher G, Tamura K. MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for bigger datasets. Mol Biol Evol 2016; 33:1870–1874 [View Article]
    [Google Scholar]
  25. Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 2014; 30:1312–1313 [View Article] [PubMed]
    [Google Scholar]
  26. Swofford D. PAUP*. n.d
  27. Sahl JW, Lemmer D, Travis J, Schupp JM, Gillece JD et al. NASP: an accurate, rapid method for the identification of SNPs in WGS datasets that supports flexible input and output formats. Microb Genom 2016; 2:e000074 [View Article] [PubMed]
    [Google Scholar]
  28. Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M et al. Versatile and open software for comparing large genomes. Genome Biol 2004; 5:R12 [View Article] [PubMed]
    [Google Scholar]
  29. DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 2011; 43:491–498 [View Article] [PubMed]
    [Google Scholar]
  30. Birger C, Hanna M, Salinas E, Neff J, Saksena G et al. FireCloud, a scalable cloud-based platform for collaborative genome analysis: strategies for reducing and controlling costs. bioRxiv209494 2017 [View Article]
    [Google Scholar]
  31. Katoh K, Toh H. Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinform 2008; 9:286–298 [View Article] [PubMed]
    [Google Scholar]
  32. Zheng X, Levine D, Shen J, Gogarten SM, Laurie C et al. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinforma Oxf Engl 2012; 28:3326–3328 [View Article]
    [Google Scholar]
  33. Lunter G, Goodson M. Stampy: a statistical algorithm for sensitive and fast mapping of illumina sequence reads. Genome Res 2011; 21:936–939 [View Article]
    [Google Scholar]
  34. Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 2014; 9:e112963 [View Article] [PubMed]
    [Google Scholar]
  35. Page AJ, De Silva N, Hunt M, Quail MA, Parkhill J et al. Robust high-throughput prokaryote de novo assembly and improvement pipeline for illumina data. Microb Genom 2016; 2:e000083 [View Article]
    [Google Scholar]
  36. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 2012; 19:455–477 [View Article] [PubMed]
    [Google Scholar]
  37. Boetzer M, Henkel CV, Jansen HJ, Butler D, Pirovano W. Scaffolding pre-assembled contigs using SSPACE. Bioinforma Oxf Engl 2011; 27:578–579 [View Article]
    [Google Scholar]
  38. Boetzer M, Pirovano W. Toward almost closed genomes with GapFiller. Genome Biol 2012; 13:R56 [View Article] [PubMed]
    [Google Scholar]
  39. Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinforma Oxf Engl 2014; 30:2068–2069 [View Article]
    [Google Scholar]
  40. Pruitt KD, Tatusova T, Brown GR, Maglott DR. NCBI reference sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res 2012; 40:D130–5 [View Article]
    [Google Scholar]
  41. Muñoz JF, Welsh RM, Shea T, Batra D, Gade L et al. Clade-specific chromosomal rearrangements and loss of subtelomeric adhesins in Candida auris. Genetics 2021; 218:iyab029 [View Article]
    [Google Scholar]
  42. Bravo Ruiz G, Ross ZK, Holmes E, Schelenz S, Gow NAR et al. Rapid and extensive karyotype diversification in haploid clinical Candida auris isolates. Curr Genet 2019; 65:1217–1228 [View Article]
    [Google Scholar]
  43. Sims D, Sudbery I, Ilott NE, Heger A, Ponting CP. Sequencing depth and coverage: key considerations in genomic analyses. Nat Rev Genet 2014; 15:121–132 [View Article] [PubMed]
    [Google Scholar]
  44. Poplin R, Chang P-C, Alexander D, Schwartz S, Colthurst T et al. A universal SNP and small-indel variant caller using deep neural networks. Nat Biotechnol 2018; 36:983–987 [View Article] [PubMed]
    [Google Scholar]
  45. Zook JM, McDaniel J, Olson ND, Wagner J, Parikh H et al. An open resource for accurately benchmarking small variant and reference calls. Nat Biotechnol 2019; 37:561–566 [View Article] [PubMed]
    [Google Scholar]
  46. Welsh RM, Misas E, Forsberg K, Lyman M, Chow NA et al. Candida auris whole-genome sequence benchmark dataset for phylogenomic pipelines. J Fungi Basel Switz 2021; 7:214 [View Article]
    [Google Scholar]
  47. Walter KS, Colijn C, Cohen T, Mathema B, Liu Q et al. Genomic variant-identification methods may alter Mycobacterium tuberculosis transmission inferences. Microb Genom 2020; 6:mgen000418 [View Article] [PubMed]
    [Google Scholar]
  48. Bagal UR, Phan J, Welsh RM, Misas E, Wagner D et al. MycoSNP: a portable workflow for performing whole-genome sequencing analysis of Candida auris. Methods Mol Biol 2022; 2517:215–228 [View Article]
    [Google Scholar]
  49. Regier AA, Farjoun Y, Larson DE, Krasheninina O, Kang HM et al. Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects. Nat Commun 2018; 9:4038 [View Article] [PubMed]
    [Google Scholar]
  50. NCBI Resource Coordinators Database resources of the national center for biotechnology information. Nucleic Acids Res 2017; 45:D12–D17 [View Article]
    [Google Scholar]
/content/journal/mgen/10.1099/mgen.0.000979
Loading
/content/journal/mgen/10.1099/mgen.0.000979
Loading

Data & Media loading...

Supplements

Supplementary material 1

PDF

Supplementary material 2

EXCEL

Supplementary material 3

EXCEL

Supplementary material 4

PDF
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error