1887

Abstract

It is now possible to assemble near-perfect bacterial genomes using Oxford Nanopore Technologies (ONT) long reads, but short-read polishing is usually required for perfection. However, the effect of short-read depth on polishing performance is not well understood. Here, we introduce Pypolca (with default and careful parameters) and Polypolish v0.6.0 (with a new careful parameter). We then show that: (1) all polishers other than Pypolca-careful, Polypolish-default and Polypolish-careful commonly introduce false-positive errors at low read depth; (2) most of the benefit of short-read polishing occurs by 25× depth; (3) Polypolish-careful almost never introduces false-positive errors at any depth; and (4) Pypolca-careful is the single most effective polisher. Overall, we recommend the following polishing strategies: Polypolish-careful alone when depth is very low (<5×), Polypolish-careful and Pypolca-careful when depth is low (5–25×), and Polypolish-default and Pypolca-careful when depth is sufficient (>25×).

Funding
This study was supported by the:
  • National Health and Medical Research Council (Award GNT1194325)
    • Principle Award Recipient: TimothyP. Stinear
  • Garnett Passe and Rodney Williams Memorial Foundation
    • Principle Award Recipient: SarahVreugde
  • National Institutes of Health (Award RC2DK116713)
    • Principle Award Recipient: RobertA. Edwards
  • This is an open-access article distributed under the terms of the Creative Commons Attribution License. This article was made open access via a Publish and Read agreement between the Microbiology Society and the corresponding author’s institution.
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.001254
2024-06-04
2024-06-17
Loading full text...

Full text loading...

/deliver/fulltext/mgen/10/6/mgen001254.html?itemId=/content/journal/mgen/10.1099/mgen.0.001254&mimeType=html&fmt=ahah

References

  1. Wick R. ONT-only accuracy: 5 kHz and Dorado. Ryan Wick’s bioinformatics blog; 2023 https://rrwick.github.io/2023/10/24/ont-only-accuracy-update.html
  2. Wick RR, Holt KE. Benchmarking of long-read assemblers for prokaryote whole genome sequencing; 2021 https://doi.org/10.12688/f1000research.21782.4
  3. Delahaye C, Nicolas J. Sequencing DNA with nanopores: troubles and biases. PLoS One 2021; 16:e0257521 [View Article] [PubMed]
    [Google Scholar]
  4. Lerminiaux N, Fakharuddin K, Mulvey MR, Mataseje L. Do we still need Illumina sequencing data?: Evaluating Oxford Nanopore Technologies R10.4.1 flow cells and v14 library prep kits for Gram negative bacteria whole genome assemblies. biorxiv 2023 [View Article]
    [Google Scholar]
  5. Sanderson ND, Hopkins K, Colpus M, Parker M, Lipworth S et al. Evaluation of the accuracy of bacterial genome reconstruction with Oxford Nanopore R10.4.1 long-read-only sequencing; 2024 https://doi.org/10.1101/2024.01.12.575342
  6. Wick RR, Judd LM, Cerdeira LT, Hawkey J, Méric G et al. Trycycler: consensus long-read assemblies for bacterial genomes. Genome Biol 2021; 22:266 [View Article] [PubMed]
    [Google Scholar]
  7. Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 2008; 456:53–59 [View Article] [PubMed]
    [Google Scholar]
  8. Zimin AV, Salzberg SL. The genome polishing tool POLCA makes fast and accurate corrections in genome assemblies. PLOS Comput Biol 2020; 16:e1007981 [View Article] [PubMed]
    [Google Scholar]
  9. Wick RR, Holt KE. Polypolish: Short-read polishing of long-read bacterial genome assemblies. PLOS Comput Biol 2022; 18:e1009802 [View Article] [PubMed]
    [Google Scholar]
  10. Zimin AV, Marçais G, Puiu D, Roberts M, Salzberg SL et al. The MaSuRCA genome assembler. Bioinformatics 2013; 29:2669–2677 [View Article] [PubMed]
    [Google Scholar]
  11. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM; 2013 https://doi.org/10.48550/arXiv.1303.3997
  12. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J et al. The sequence alignment/map format and SAMtools. Bioinformatics 2009; 25:2078–2079 [View Article] [PubMed]
    [Google Scholar]
  13. Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing; 2012 https://doi.org/10.48550/arXiv.1207.3907
  14. Bouras G, Grigson SR, Papudeshi B, Mallawaarachchi V, Roach MJ. Dnaapler: a tool to reorient circular microbial genomes. JOSS 2024; 9:5968 [View Article]
    [Google Scholar]
  15. Bouras G, Houtak G, Wick RR, Mallawaarachchi V, Roach MJ et al. Hybracter: enabling scalable, automated, complete and accurate bacterial genome assemblies; 2023 https://doi.org/10.1101/2023.12.12.571215
  16. Wick RR, Judd LM, Holt KE. Assembling the perfect bacterial genome using Oxford nanopore and illumina sequencing. PLOS Comput Biol 2023; 19:e1010905 [View Article] [PubMed]
    [Google Scholar]
  17. Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 2018; 34:i884–i890 [View Article] [PubMed]
    [Google Scholar]
  18. Li H. seqtk: a fast and lightweight tool for processing sequences in the FASTA or FASTQ format; 2023 https://github.com/lh3/seqtk
  19. Kundu R, Casey J, Sung W-K. HyPo: super fast & accurate polisher for long read genome assemblies; 2019 https://doi.org/10.1101/2019.12.19.882506
  20. Mak QXC, Wick RR, Holt JM, Wang JR. Polishing de novo nanopore assemblies of bacteria and eukaryotes with FMLRC2. Mol Biol Evol 2023; 40:msad048 [View Article] [PubMed]
    [Google Scholar]
  21. Hu J, Fan J, Sun Z, Liu S. NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics 2020; 36:2253–2255 [View Article] [PubMed]
    [Google Scholar]
  22. Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLOS ONE 2014; 9:e112963 [View Article] [PubMed]
    [Google Scholar]
  23. Shen W, Le S, Li Y, Hu F. SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PLOS ONE 2016; 11:e0163962 [View Article] [PubMed]
    [Google Scholar]
  24. Shen W, Sipos B, Zhao L. SeqKit2: a Swiss army knife for sequence and alignment processing. iMeta 2024e191 [View Article]
    [Google Scholar]
  25. Wick RR, Bouras G. A tale of two misassemblies. Ryan Wick’s bioinformatics blog; 2024 https://rrwick.github.io/2024/02/15/misassemblies.html
  26. Clark SC, Egan R, Frazier PI, Wang Z. ALE: a generic assembly likelihood evaluation framework for assessing the accuracy of genome and metagenome assemblies. Bioinformatics 2013; 29:435–443 [View Article]
    [Google Scholar]
  27. Benjamini Y, Speed TP. Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res 2012; 40:e72 [View Article] [PubMed]
    [Google Scholar]
  28. Chen Y-C, Liu T, Yu C-H, Chiang T-Y, Hwang C-C. Effects of GC bias in next-generation-sequencing data on de novo genome assembly. PLOS ONE 2013; 8:e62856 [View Article] [PubMed]
    [Google Scholar]
  29. Segerman B, Ástvaldsson Á, Mustafa L, Skarin J, Skarin H. The efficiency of Nextera XT tagmentation depends on G and C bases in the binding motif leading to uneven coverage in bacterial species with low and neutral GC-content. Front Microbiol 2022; 13:944770 [View Article] [PubMed]
    [Google Scholar]
  30. Sereika M, Kirkegaard RH, Karst SM, Michaelsen TY, Sørensen EA et al. Oxford Nanopore R10.4 long-read sequencing enables the generation of near-finished bacterial genomes from pure cultures and metagenomes without short-read or reference polishing. Nat Methods 2022; 19:823–826 [View Article] [PubMed]
    [Google Scholar]
  31. Cattonaro F, Spadotto A, Radovic S, Marroni F. Do you cov me? effect of coverage reduction on metagenome shotgun sequencing studies. F1000Res 2020; 7:1767 [View Article] [PubMed]
    [Google Scholar]
  32. Benoit G, Raguideau S, James R, Phillippy AM, Chikhi R et al. High-quality metagenome assembly from long accurate reads with metaMDBG. Nat Biotechnol 2024; 1–6: [View Article] [PubMed]
    [Google Scholar]
  33. Liu L, Yang Y, Deng Y, Zhang T. Nanopore long-read-only metagenomics enables complete and high-quality genome reconstruction from mock and complex metagenomes. Microbiome 2022; 10:209 [View Article] [PubMed]
    [Google Scholar]
  34. Cook R, Telatin A, Hsieh S-Y, Newberry F, Tariq MA et al. Nanopore and illumina sequencing reveal different viral populations from human gut samples; 2023 https://doi.org/10.1101/2023.11.24.568560
  35. van der Walt AJ, van Goethem MW, Ramond J-B, Makhalanyane TP, Reva O et al. Assembling metagenomes, one community at a time. BMC Genomics 2017; 18:521 [View Article] [PubMed]
    [Google Scholar]
  36. Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES et al. Integrative genomics viewer. Nat Biotechnol 2011; 29:24–26 [View Article] [PubMed]
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.001254
Loading
/content/journal/mgen/10.1099/mgen.0.001254
Loading

Data & Media loading...

Supplements

Supplementary material 1

PDF

Supplementary material 2

EXCEL
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error