Skip to content
1887

Abstract

Accurate nucleotide variant calling is essential in microbial genomics, particularly for outbreak tracking and phylogenetics. This study evaluates variant calls derived from genome assemblies compared to traditional read-based variant-calling methods, using seven closely related isolates sequenced on Illumina and Oxford Nanopore Technologies platforms. By benchmarking multiple assembly and variant-calling pipelines against a ground truth dataset, we found that read-based methods consistently achieved high accuracy. Assembly-based approaches performed well in some cases but were highly dependent on assembly quality, as errors in the assembly led to false-positive variant calls. These findings underscore the need for improved assembly techniques before the potential benefits of assembly-based variant calling (such as reduced computational requirements and simpler data management) can be realized.

Funding
This study was supported by the:
  • National Health and Medical Research Council (Award APP1105525)
    • Principal Award Recipient: TimothyP Stinear
  • Australian Research Council (Award DP240102465)
    • Principal Award Recipient: TimothyP Stinear
  • Australian Research Council (Award DE250100677)
    • Principal Award Recipient: RyanR Wick
  • This is an open-access article distributed under the terms of the Creative Commons Attribution License. This article was made open access via a Publish and Read agreement between the Microbiology Society and the corresponding author’s institution.
Loading

Article metrics loading...

/content/journal/acmi/10.1099/acmi.0.001025.v3
2025-05-28
2026-04-14

Metrics

Loading full text...

Full text loading...

/deliver/fulltext/acmi/7/5/acmi001025.v3.html?itemId=/content/journal/acmi/10.1099/acmi.0.001025.v3&mimeType=html&fmt=ahah

References

  1. Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol 2020; 37:1530–1534 [View Article] [PubMed]
    [Google Scholar]
  2. Gorrie CL, Da Silva AG, Ingle DJ, Higgs C, Seemann T et al. Key parameters for genomics-based real-time detection and tracking of multidrug-resistant bacteria: a systematic analysis. Lancet Microbe 2021; 2:e575–e583 [View Article] [PubMed]
    [Google Scholar]
  3. Redelings BD, Suchard MA. Incorporating indel information into phylogeny estimation for rapidly emerging pathogens. BMC Evol Biol 2007; 7:40 [View Article] [PubMed]
    [Google Scholar]
  4. Gorrie CL, Mirčeta M, Wick RR, Edwards DJ, Thomson NR et al. Gastrointestinal carriage is a major reservoir of Klebsiella pneumoniae infection in intensive care patients. Clin Infect Dis 2017; 65:208–215 [View Article] [PubMed]
    [Google Scholar]
  5. Wick RR, Judd LM, Cerdeira LT, Hawkey J, Méric G et al. Trycycler: consensus long-read assemblies for bacterial genomes. Genome Biol 2021; 22:266 [View Article] [PubMed]
    [Google Scholar]
  6. Sharkey LKR, Guerillot R, Walsh CJ, Turner AM, Lee JYH et al. The two-component system WalKR provides an essential link between cell wall homeostasis and DNA replication in Staphylococcus aureus. mBio 2023; 14:e0226223 [View Article] [PubMed]
    [Google Scholar]
  7. Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 2018; 34:i884–i890 [View Article] [PubMed]
    [Google Scholar]
  8. Bouras G, Judd LM, Edwards RA, Vreugde S, Stinear TP et al. How low can you go? short-read polishing of oxford nanopore bacterial genome assemblies. Microb Genom 2024; 10:001254 [View Article] [PubMed]
    [Google Scholar]
  9. Bouras G, Grigson SR, Papudeshi B, Mallawaarachchi V, Roach MJ. Dnaapler: a tool to reorient circular microbial genomes. JOSS 2024; 9:5968 [View Article]
    [Google Scholar]
  10. Zheng Z, Li S, Su J, Leung AWS, Lam TW et al. Symphonizing pileup and full-alignment for deep learning-based long-read variant calling. Nat Comput Sci 2022; 2:797–803 [View Article] [PubMed]
    [Google Scholar]
  11. Smolka M, Paulin LF, Grochowski CM, Horner DW, Mahmoud M et al. Detection of mosaic and population-level structural variants with Sniffles2. Nat Biotechnol 2024; 42:1571–1580 [View Article] [PubMed]
    [Google Scholar]
  12. Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. arXiv 2012
    [Google Scholar]
  13. Hall M. Rasusa: randomly subsample sequencing reads to a specified coverage. JOSS 2022; 7:3941 [View Article]
    [Google Scholar]
  14. Hall MB, Wick RR, Judd LM, Nguyen AN, Steinig EJ et al. Benchmarking reveals superiority of deep learning variant callers on bacterial nanopore sequence data. Elife 2024; 13:RP98300 [View Article] [PubMed]
    [Google Scholar]
  15. Seemann T. Snippy [Internet]; 2020 https://github.com/tseemann/snippy
  16. Wick RR, Judd LM, Gorrie CL, Holt KE. Unicycler: resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput Biol 2017; 13:e1005595 [View Article] [PubMed]
    [Google Scholar]
  17. Souvorov A, Agarwala R, Lipman DJ. SKESA: strategic k-mer extension for scrupulous assemblies. Genome Biol 2018; 19:153 [View Article] [PubMed]
    [Google Scholar]
  18. Prjibelski A, Antipov D, Meleshko D, Lapidus A, Korobeynikov A. Using SPAdes de novo assembler. Curr Protoc Bioinform 2020; 70:e102 [View Article] [PubMed]
    [Google Scholar]
  19. Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res 2017; 27:722–736 [View Article] [PubMed]
    [Google Scholar]
  20. Kolmogorov M, Yuan J, Lin Y, Pevzner PA. Assembly of long, error-prone reads using repeat graphs. Nat Biotechnol 2019; 37:540–546 [View Article] [PubMed]
    [Google Scholar]
  21. Vaser R, Šikić M. Time- and memory-efficient genome assembly with Raven. Nat Comput Sci 2021; 1:332–336 [View Article] [PubMed]
    [Google Scholar]
  22. Bouras G, Houtak G, Wick RR, Mallawaarachchi V, Roach MJ et al. Hybracter: enabling scalable, automated, complete and accurate bacterial genome assemblies. Microb Genom 2024; 10:001244 [View Article] [PubMed]
    [Google Scholar]
  23. Wright C, Wykes M. Medaka [Internet]; 2022 https://github.com/nanoporetech/medaka
  24. Marçais G, Delcher AL, Phillippy AM, Coston R, Salzberg SL et al. MUMmer4: a fast and versatile genome alignment system. PLoS Comput Biol 2018; 14:e1005944 [View Article] [PubMed]
    [Google Scholar]
  25. Schiavinato M. all2vcf [Internet]; 2024 https://github.com/rrwick/all2vcf
  26. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv 2013
    [Google Scholar]
  27. Derelle R, von Wachsmann J, Mäklin T, Hellewell J, Russell T et al. Seamless, rapid, and accurate analyses of outbreak genomic data using split k-mer analysis. Genome Res 2024; 34:1661–1673 [View Article] [PubMed]
    [Google Scholar]
  28. Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V et al. Twelve years of SAMtools and BCFtools. Gigascience 2021; 10:giab008 [View Article] [PubMed]
    [Google Scholar]
  29. Dunn T, Narayanasamy S. vcfdist: accurately benchmarking phased small variant calls in human genomes. Nat Commun 2023; 14:8149 [View Article] [PubMed]
    [Google Scholar]
  30. Morgulis A, Gertz EM, Schäffer AA, Agarwala R. A fast and symmetric DUST implementation to mask low-complexity DNA sequences. J Comput Biol 2006; 13:1028–1040 [View Article] [PubMed]
    [Google Scholar]
  31. Frith MC. A new repeat-masking method enables specific detection of homologous sequences. Nucleic Acids Res 2011; 39:e23 [View Article] [PubMed]
    [Google Scholar]
  32. Wick RR, Judd LM, Wyres KL, Holt KE. Recovery of small plasmid sequences via oxford nanopore sequencing. Microb Genom 2021; 7:000631 [View Article] [PubMed]
    [Google Scholar]
  33. Wick RR. Medaka v2: progress and potential pitfalls [Internet]; 2024 https://rrwick.github.io/2024/10/17/medaka-v2.html
  34. Wick RR. FASTQ assemblies with dorado polish [Internet]; 2025 https://rrwick.github.io/2025/02/19/fastq-assemblies.html
/content/journal/acmi/10.1099/acmi.0.001025.v3
Loading
/content/journal/acmi/10.1099/acmi.0.001025.v3
Loading

Data & Media loading...

Supplements

Supplementary material 1

Supplementary material 2

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An error occurred
Approval was partially successful, following selected items could not be processed due to error