1887

Abstract

Viral metagenomics has fuelled a rapid change in our understanding of global viral diversity and ecology. Long-read sequencing and hybrid assembly approaches that combine long- and short-read technologies are now being widely implemented in bacterial genomics and metagenomics. However, the use of long-read sequencing to investigate viral communities is still in its infancy. While Nanopore and PacBio technologies have been applied to viral metagenomics, it is not known to what extent different technologies will impact the reconstruction of the viral community. Thus, we constructed a mock bacteriophage community of previously sequenced phage genomes and sequenced them using Illumina, Nanopore and PacBio sequencing technologies and tested a number of different assembly approaches. When using a single sequencing technology, Illumina assemblies were the best at recovering phage genomes. Nanopore- and PacBio-only assemblies performed poorly in comparison to Illumina in both genome recovery and error rates, which both varied with the assembler used. The best Nanopore assembly had errors that manifested as SNPs and INDELs at frequencies 41 and 157 % higher than found in Illumina only assemblies, respectively. While the best PacBio assemblies had SNPs at frequencies 12 and 78 % higher than found in Illumina-only assemblies, respectively. Despite high-read coverage, long-read-only assemblies recovered a maximum of one complete genome from any assembly, unless reads were down-sampled prior to assembly. Overall the best approach was assembly by a combination of Illumina and Nanopore reads, which reduced error rates to levels comparable with short-read-only assemblies. When using a single technology, Illumina only was the best approach. The differences in genome recovery and error rates between technology and assembler had downstream impacts on gene prediction, viral prediction, and subsequent estimates of diversity within a sample. These findings will provide a starting point for others in the choice of reads and assembly algorithms for the analysis of viromes.

Funding
This study was supported by the:
  • Medical Research Council (Award MR/L015080/1)
    • Principle Award Recipient: AndrewMillard
  • Medical Research Foundation National PhD Training Programme in Antimicrobial Resistance Research (Award MRF-145-0004-TPG-AVISO)
    • Principle Award Recipient: RyanCook
  • This is an open-access article distributed under the terms of the Creative Commons Attribution License. This article was made open access via a Publish and Read agreement between the Microbiology Society and the corresponding author’s institution.
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.001198
2024-02-20
2024-05-19
Loading full text...

Full text loading...

/deliver/fulltext/mgen/10/2/mgen001198.html?itemId=/content/journal/mgen/10.1099/mgen.0.001198&mimeType=html&fmt=ahah

References

  1. Millard A, Cook R. Bench marking the assembly of viromes using Illumina, PacBio and Nanopore reads. Figshare 2023 https://doi.org/10.25392/leicester.data.21346935.v2
    [Google Scholar]
  2. Breitbart M, Bonnain C, Malki K, Sawaya NA. Phage puppet masters of the marine microbial realm. Nat Microbiol 2018; 3:754–766 [View Article] [PubMed]
    [Google Scholar]
  3. Perez Sepulveda B, Redgwell T, Rihtman B, Pitt F, Scanlan DJ et al. Marine phage genomics: the tip of the iceberg. FEMS Microbiol Lett 2016; 363:fnw158 [View Article] [PubMed]
    [Google Scholar]
  4. Cobián Güemes AG, Youle M, Cantú VA, Felts B, Nulton J et al. Viruses as winners in the game of life. Annu Rev Virol 2016; 3:197–214 [View Article] [PubMed]
    [Google Scholar]
  5. Sanger F, Air GM, Barrell BG, Brown NL, Coulson AR et al. Nucleotide sequence of bacteriophage phi X174 DNA. Nature 1977; 265:687–695 [View Article] [PubMed]
    [Google Scholar]
  6. Cook R, Brown N, Redgwell T, Rihtman B, Barnes M et al. INfrastructure for a PHAge REference database: identification of large-scale biases in the current collection of cultured phage genomes. PHAGE 2021; 2:214–223 [View Article] [PubMed]
    [Google Scholar]
  7. Paez-Espino D, Chen I-MA, Palaniappan K, Ratner A, Chu K et al. IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses. Nucleic Acids Res 2017; 45:D457–D465 [View Article] [PubMed]
    [Google Scholar]
  8. Rangel-Pineros G, Millard A, Michniewski S, Scanlan D, Sirén K et al. From trees to clouds: PhageClouds for fast comparison of ∼640,000 phage genomic sequences and host-centric visualization using genomic network graphs. PHAGE 2021; 2:194–203 [View Article] [PubMed]
    [Google Scholar]
  9. Roux S, Páez-Espino D, Chen I-MA, Palaniappan K, Ratner A et al. IMG/VR v3: an integrated ecological and evolutionary framework for interrogating genomes of uncultivated viruses. Nucleic Acids Res 2021; 49:D764–D775 [View Article] [PubMed]
    [Google Scholar]
  10. Gregory AC, Zayed AA, Conceição-Neto N, Temperton B, Bolduc B et al. Marine DNA viral macro- and microdiversity from pole to pole. Cell 2019; 177:1109–1123 [View Article] [PubMed]
    [Google Scholar]
  11. Cheng R, Li X, Jiang L, Gong L, Geslin C et al. Virus diversity and interactions with hosts in deep-sea hydrothermal vents. Microbiome 2022; 10:235 [View Article] [PubMed]
    [Google Scholar]
  12. Shan T, Li L, Simmonds P, Wang C, Moeser A et al. The fecal virome of pigs on a high-density farm. J Virol 2011; 85:11697–11708 [View Article] [PubMed]
    [Google Scholar]
  13. Babenko VV, Millard A, Kulikov EE, Spasskaya NN, Letarova MA et al. The ecogenomics of dsDNA bacteriophages in feces of stabled and feral horses. Comput Struct Biotechnol J 2020; 18:3457–3467 [View Article] [PubMed]
    [Google Scholar]
  14. Camarillo-Guerrero LF, Almeida A, Rangel-Pineros G, Finn RD, Lawley TD. Massive expansion of human gut bacteriophage diversity. Cell 2021; 184:1098–1109 [View Article] [PubMed]
    [Google Scholar]
  15. Warwick-Dugdale J, Solonenko N, Moore K, Chittick L, Gregory AC et al. Long-read viral metagenomics captures abundant and microdiverse viral populations and their niche-defining genomic islands. PeerJ 2019; 7:e6800 [View Article] [PubMed]
    [Google Scholar]
  16. Roux S, Hawley AK, Torres Beltran M, Scofield M, Schwientek P et al. Ecology and evolution of viruses infecting uncultivated SUP05 bacteria as revealed by single-cell- and meta-genomics. Elife 2014; 3:e03125 [View Article] [PubMed]
    [Google Scholar]
  17. Wang Y, Zhao Y, Bollas A, Wang Y, Au KF. Nanopore sequencing technology, bioinformatics and applications. Nat Biotechnol 2021; 39:1348–1365 [View Article] [PubMed]
    [Google Scholar]
  18. Kanwar N, Blanco C, Chen IA, Seelig B. PacBio sequencing output increased through uniform and directional fivefold concatenation. Sci Rep 2021; 11:18065 [View Article] [PubMed]
    [Google Scholar]
  19. Beaulaurier J, Luo E, Eppley JM, Uyl PD, Dai X et al. Assembly-free single-molecule sequencing recovers complete virus genomes from natural microbial communities. Genome Res 2020; 30:437–446 [View Article] [PubMed]
    [Google Scholar]
  20. Yilmaz S, Allgaier M, Hugenholtz P. Multiple displacement amplification compromises quantitative analysis of metagenomes. Nat Methods 2010; 7:943–944 [View Article] [PubMed]
    [Google Scholar]
  21. Marine R, McCarren C, Vorrasane V, Nasko D, Crowgey E et al. Caught in the middle with multiple displacement amplification: the myth of pooling for avoiding multiple displacement amplification bias in a metagenome. Microbiome 2014; 2:3 [View Article] [PubMed]
    [Google Scholar]
  22. Kim K-H, Bae J-W. Amplification methods bias metagenomic libraries of uncultured single-stranded and double-stranded DNA viruses. Appl Environ Microbiol 2011; 77:7663–7668 [View Article] [PubMed]
    [Google Scholar]
  23. Cook R, Hooton S, Trivedi U, King L, Dodd CER et al. Hybrid assembly of an agricultural slurry virome reveals a diverse and stable community with the potential to alter the metabolism and virulence of veterinary pathogens. Microbiome 2021; 9:65 [View Article] [PubMed]
    [Google Scholar]
  24. Zablocki O, Michelsen M, Burris M, Solonenko N, Warwick-Dugdale J et al. VirION2: a short- and long-read sequencing and informatics workflow to study the genomic diversity of viruses in nature. PeerJ 2021; 9:e11088 [View Article] [PubMed]
    [Google Scholar]
  25. Michniewski S, Rihtman B, Cook R, Jones MA, Wilson WH et al. A new family of “megaphages” abundant in the marine environment. ISME Commun 2021; 1:1–4 [View Article] [PubMed]
    [Google Scholar]
  26. Yahara K, Suzuki M, Hirabayashi A, Suda W, Hattori M et al. Long-read metagenomics using PromethION uncovers oral bacteriophages and their interaction with host bacteria. Nat Commun 2021; 12:27 [View Article] [PubMed]
    [Google Scholar]
  27. Zaragoza-Solas A, Haro-Moreno JM, Rodriguez-Valera F, López-Pérez M. Long-read metagenomics improves the recovery of viral diversity from complex natural marine samples. mSystems 2022; 7:e0019222 [View Article] [PubMed]
    [Google Scholar]
  28. Eckstein S, Stender J, Mzoughi S, Vogele K, Kühn J et al. Isolation and characterization of lytic phage TUN1 specific for Klebsiella pneumoniae K64 clinical isolates from Tunisia. BMC Microbiol 2021; 21:186 [View Article] [PubMed]
    [Google Scholar]
  29. Kupritz J, Martin J, Fischer K, Curtis KC, Fauver JR et al. Isolation and characterization of a novel bacteriophage WO from Allonemobius socius crickets in Missouri. PLoS One 2021; 16:e0250051 [View Article] [PubMed]
    [Google Scholar]
  30. Song Y, Peters TL, Bryan DW, Hudson LK, Denes TG. Characterization of a novel group of lsteria phages that target serotype 4b listeria monocytogenes. Viruses 2021; 13:671 [View Article]
    [Google Scholar]
  31. Akhwale JK, Rohde M, Rohde C, Bunk B, Spröer C et al. Comparative genomic analysis of eight novel haloalkaliphilic bacteriophages from Lake Elmenteita, Kenya. PLoS One 2019; 14:e0212102 [View Article] [PubMed]
    [Google Scholar]
  32. Elek CKA, Brown TL, Le Viet T, Evans R, Baker DJ et al. A hybrid and poly-polish workflow for the complete and accurate assembly of phage genomes: a case study of ten przondoviruses. Microb Genom 2023; 9:mgen001065 [View Article] [PubMed]
    [Google Scholar]
  33. Rihtman B, Meaden S, Clokie MRJ, Koskella B, Millard AD. Assessing Illumina technology for the high-throughput sequencing of bacteriophage genomes. PeerJ 2016; 4:e2055 [View Article] [PubMed]
    [Google Scholar]
  34. Michniewski S, Redgwell T, Grigonyte A, Rihtman B, Aguilo-Ferretjans M et al. Riding the wave of genomics to investigate aquatic coliphage diversity and activity. Environ Microbiol 2019; 21:2112–2128 [View Article] [PubMed]
    [Google Scholar]
  35. Millard AD, Zwirglmaier K, Downey MJ, Mann NH, Scanlan DJ. Comparative genomics of marine cyanomyoviruses reveals the widespread occurrence of Synechococcus host genes localized to a hyperplastic region: implications for mechanisms of cyanophage evolution. Environ Microbiol 2009; 11:2370–2387 [View Article] [PubMed]
    [Google Scholar]
  36. Wick RR. Porechop. Github; 2017 https://github com/rrwick
  37. Telatin A, Fariselli P, Birolo G. SeqFu: a suite of utilities for the robust and reproducible manipulation of sequence files. Bioengineering 2021; 8:59 [View Article] [PubMed]
    [Google Scholar]
  38. Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 2018; 34:3094–3100 [View Article] [PubMed]
    [Google Scholar]
  39. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009; 25:2078–2079 [View Article]
    [Google Scholar]
  40. Krueger F. Trim Galore. A Wrapper Tool Around Cutadapt and FastQC to Consistently Apply Quality and Adapter Trimming to FastQ Files 2015 pp 516–517
    [Google Scholar]
  41. Nurk S, Meleshko D, Korobeynikov A, Pevzner PA. metaSPAdes: a new versatile metagenomic assembler. Genome Res 2017; 27:824–834 [View Article] [PubMed]
    [Google Scholar]
  42. Kolmogorov M, Yuan J, Lin Y, Pevzner PA. Assembly of long, error-prone reads using repeat graphs. Nat Biotechnol 2019; 37:540–546 [View Article] [PubMed]
    [Google Scholar]
  43. Wick RR, Judd LM, Gorrie CL, Holt KE. Unicycler: resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput Biol 2017; 13:e1005595 [View Article] [PubMed]
    [Google Scholar]
  44. Li H. Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics 2016; 32:2103–2110 [View Article] [PubMed]
    [Google Scholar]
  45. Vaser R, Sović I, Nagarajan N, Šikić M. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res 2017; 27:737–746 [View Article] [PubMed]
    [Google Scholar]
  46. Ruan J, Li H. Fast and accurate long-read assembly with wtdbg2. Nat Methods 2020; 17:155–158 [View Article] [PubMed]
    [Google Scholar]
  47. Wick RR, Holt KE. Polypolish: short-read polishing of long-read bacterial genome assemblies. PLoS Comput Biol 2022; 18:e1009802 [View Article] [PubMed]
    [Google Scholar]
  48. Li H. Aligning Sequence Reads, Clone Sequences and Assembly Contigs With BWA-MEM 2013
    [Google Scholar]
  49. Roux S, Adriaenssens EM, Dutilh BE, Koonin EV, Kropinski AM et al. Minimum information about an uncultivated virus genome (MIUViG). Nat Biotechnol 2019; 37:29–37 [View Article] [PubMed]
    [Google Scholar]
  50. Nayfach S, Camargo AP, Schulz F, Eloe-Fadrosh E, Roux S et al. CheckV assesses the quality and completeness of metagenome-assembled viral genomes. Nat Biotechnol 2021; 39:578–585 [View Article] [PubMed]
    [Google Scholar]
  51. Mikheenko A, Saveliev V, Gurevich A. MetaQUAST: evaluation of metagenome assemblies. Bioinformatics 2016; 32:1088–1090 [View Article] [PubMed]
    [Google Scholar]
  52. Kieft K, Zhou Z, Anantharaman K. VIBRANT: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences. Microbiome 2020; 8:90 [View Article] [PubMed]
    [Google Scholar]
  53. Ren J, Song K, Deng C, Ahlgren NA, Fuhrman JA et al. Identifying viruses from metagenomic data using deep learning. Quant Biol 2020; 8:64–77 [View Article] [PubMed]
    [Google Scholar]
  54. Hyatt D, Chen G-L, Locascio PF, Land ML, Larimer FW et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 2010; 11:119 [View Article] [PubMed]
    [Google Scholar]
  55. Roux S, Emerson JB, Eloe-Fadrosh EA, Sullivan MB. Benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity. PeerJ 2017; 5:e3817 [View Article] [PubMed]
    [Google Scholar]
  56. Bushnell B. BBMap: A Fast, Accurate, Splice-Aware Aligner Berkeley, CA (United States): Lawrence Berkeley National Lab. (LBNL); 2014
    [Google Scholar]
  57. McMurdie PJ, Holmes S. phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PLoS One 2013; 8:e61217 [View Article] [PubMed]
    [Google Scholar]
  58. R Foundation for Statistical Computing R: A Language and Environment for Statistical Computing Vienna, Austria. R Core Team; 2020
    [Google Scholar]
  59. Deng L, Silins R, Castro-Mejía JL, Kot W, Jessen L et al. A protocol for extraction of infective viromes suitable for metagenomics sequencing from low volume fecal samples. Viruses 2019; 11:667 [View Article] [PubMed]
    [Google Scholar]
  60. Browne PD, Nielsen TK, Kot W, Aggerholm A, Gilbert MTP et al. GC bias affects genomic and metagenomic reconstructions, underrepresenting GC-poor organisms. Gigascience 2020; 9:giaa008 [View Article] [PubMed]
    [Google Scholar]
  61. Sato MP, Ogura Y, Nakamura K, Nishida R, Gotoh Y et al. Comparison of the sequencing bias of currently available library preparation kits for Illumina sequencing of bacterial genomes and metagenomes. DNA Res 2019; 26:391–398 [View Article] [PubMed]
    [Google Scholar]
  62. Cheung M-S, Down TA, Latorre I, Ahringer J. Systematic bias in high-throughput sequencing data and its correction by BEADS. Nucleic Acids Res 2011; 39:e103 [View Article] [PubMed]
    [Google Scholar]
  63. Watson M, Warr A. Errors in long-read assemblies can critically affect protein prediction. Nat Biotechnol 2019; 37:124–126 [View Article] [PubMed]
    [Google Scholar]
  64. Aziz RK, Dwivedi B, Akhter S, Breitbart M, Edwards RA. Multidimensional metrics for estimating phage abundance, distribution, gene density, and sequence coverage in metagenomes. Front Microbiol 2015; 6:381 [View Article] [PubMed]
    [Google Scholar]
  65. Cuscó A, Pérez D, Viñes J, Fàbregas N, Francino O. Long-read metagenomics retrieves complete single-contig bacterial genomes from canine feces. BMC Genomics 2021; 22:330 [View Article] [PubMed]
    [Google Scholar]
  66. Xie H, Yang C, Sun Y, Igarashi Y, Jin T et al. PacBio long reads improve metagenomic assemblies, gene catalogs, and genome binning. Front Genet 2020; 11:516269 [View Article] [PubMed]
    [Google Scholar]
  67. Arumugam K, Bessarab I, Haryono MAS, Liu X, Zuniga-Montanez RE et al. Recovery of complete genomes and non-chromosomal replicons from activated sludge enrichment microbial communities with long read metagenome sequencing. NPJ Biofilms Microbiomes 2021; 7:23 [View Article] [PubMed]
    [Google Scholar]
  68. Leidenfrost RM, Pöther D-C, Jäckel U, Wünschiers R. Benchmarking the MinION: evaluating long reads for microbial profiling. Sci Rep 2020; 10:5125 [View Article] [PubMed]
    [Google Scholar]
  69. Nicholls SM, Quick JC, Tang S, Loman NJ. Ultra-deep, long-read nanopore sequencing of mock microbial community standards. Gigascience 2019; 8:giz043 [View Article] [PubMed]
    [Google Scholar]
  70. Wick RR, Holt KE. Benchmarking of long-read assemblers for prokaryote whole genome sequencing. F1000Res 2019; 8:2138 [View Article] [PubMed]
    [Google Scholar]
  71. Hackl ST, Harbig TA, Nieselt K. Technical report on best practices for hybrid and long read de novo assembly of bacterial genomes utilizing Illumina and Oxford Nanopore Technologies reads. bioRxiv 2022 [View Article]
    [Google Scholar]
  72. Kieft K, Adams A, Salamzade R, Kalan L, Anantharaman K. vRhyme enables binning of viral genomes from metagenomes. Nucleic Acids Res 2022; 50:e83 [View Article] [PubMed]
    [Google Scholar]
  73. Durai DA, Schulz MH. Improving in-silico normalization using read weights. Sci Rep 2019; 9:5133 [View Article] [PubMed]
    [Google Scholar]
  74. Sereika M, Kirkegaard RH, Karst SM, Michaelsen TY, Sørensen EA et al. Oxford Nanopore R10.4 long-read sequencing enables the generation of near-finished bacterial genomes from pure cultures and metagenomes without short-read or reference polishing. Nat Methods 2022; 19:823–826 [View Article] [PubMed]
    [Google Scholar]
  75. Kim CY, Ma J, Lee I. HiFi metagenomic sequencing enables assembly of accurate and complete genomes from human gut microbiota. Nat Commun 2022; 13:6367 [View Article] [PubMed]
    [Google Scholar]
  76. Hon T, Mars K, Young G, Tsai Y-C, Karalius JW et al. Highly accurate long-read HiFi sequencing data for five complex genomes. Sci Data 2020; 7:399 [View Article] [PubMed]
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.001198
Loading
/content/journal/mgen/10.1099/mgen.0.001198
Loading

Data & Media loading...

Supplements

Supplementary material 1

PDF

Supplementary material 2

EXCEL
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error