Skip to content
1887

Abstract

Generating complete, high-quality genome assemblies is key for any downstream analysis, such as comparative genomics. For bacterial genome assembly, various algorithms and fully automated pipelines exist, which are free-of-charge and easily accessible. However, these assembly tools often cannot unambiguously resolve a bacterial genome, for example due to the presence of sequence repeat structures on the chromosome or on plasmids. Then, a more sophisticated approach and/or manual curation is needed. Such modifications can be challenging, especially for non-bioinformaticians, because they are generally not considered as a straightforward process. In this study, we propose a standardized approach for manual genome completion focusing on the popular hybrid assembly pipeline Unicycler. The provided Galaxy workflow addresses two weaknesses in Unicycler’s hybrid assemblies: (i) collapse of inter-plasmidic repeats and (ii) false loss of single-copy sequences. To demonstrate and validate how to detect and resolve these assembly errors, we use two genomes from the group. By applying the proposed pipeline following an automated assembly, the genome sequence quality can be significantly improved.

Funding
This study was supported by the:
  • Bundesministerium für Ernährung und Landwirtschaft (Award BioSam)
  • This is an open-access article distributed under the terms of the Creative Commons Attribution License.
Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.001173
2024-01-10
2025-04-21
Loading full text...

Full text loading...

/deliver/fulltext/mgen/10/1/mgen001173.html?itemId=/content/journal/mgen/10.1099/mgen.0.001173&mimeType=html&fmt=ahah

References

  1. Land M, Hauser L, Jun S-R, Nookaew I, Leuze MR et al. Insights from 20 years of bacterial genome sequencing. Funct Integr Genomics 2015; 15:141–161 [View Article] [PubMed]
    [Google Scholar]
  2. Heather JM, Chain B. The sequence of sequencers: the history of sequencing DNA. Genomics 2016; 107:1–8 [View Article] [PubMed]
    [Google Scholar]
  3. Alkan C, Sajjadian S, Eichler EE. Limitations of next-generation genome sequence assembly. Nat Methods 2011; 8:61–65 [View Article] [PubMed]
    [Google Scholar]
  4. Hall N. Advanced sequencing technologies and their wider impact in microbiology. J Exp Biol 2007; 210:1518–1525 [View Article] [PubMed]
    [Google Scholar]
  5. Cahill MJ, Köser CU, Ross NE, Archer JAC. Read length and repeat resolution: exploring prokaryote genomes using next-generation sequencing technologies. PLoS One 2010; 5:e11518 [View Article] [PubMed]
    [Google Scholar]
  6. Koren S, Harhay GP, Smith TPL, Bono JL, Harhay DM et al. Reducing assembly complexity of microbial genomes with single-molecule sequencing. Genome Biol 2013; 14:1–16 [View Article] [PubMed]
    [Google Scholar]
  7. Wick RR, Judd LM, Gorrie CL, Holt KE. Unicycler: resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput Biol 2017; 13:e1005595 [View Article] [PubMed]
    [Google Scholar]
  8. De Maio N, Shaw LP, Hubbard A, George S, Sanderson ND et al. Comparison of long-read sequencing technologies in the hybrid assembly of complex bacterial genomes. Microb Genom 2019; 5:e000294 [View Article] [PubMed]
    [Google Scholar]
  9. Wang Y, Zhao Y, Bollas A, Wang Y, Au KF. Nanopore sequencing technology, bioinformatics and applications. Nat Biotechnol 2021; 39:1348–1365 [View Article] [PubMed]
    [Google Scholar]
  10. Amarasinghe SL, Su S, Dong X, Zappia L, Ritchie ME et al. Opportunities and challenges in long-read sequencing data analysis. Genome Biol 2020; 21:1–16 [View Article] [PubMed]
    [Google Scholar]
  11. Zhang H, Jain C, Aluru S. A comprehensive evaluation of long read error correction methods. BMC Genomics 2020; 21:1–15 [View Article] [PubMed]
    [Google Scholar]
  12. Kolmogorov M, Yuan J, Lin Y, Pevzner PA. Assembly of long, error-prone reads using repeat graphs. Nat Biotechnol 2019; 37:540–546 [View Article] [PubMed]
    [Google Scholar]
  13. Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res 2017; 27:722–736 [View Article] [PubMed]
    [Google Scholar]
  14. Vaser R, Šikić M. Time- and memory-efficient genome assembly with Raven. Nat Comput Sci 2021; 1:332–336 [View Article]
    [Google Scholar]
  15. Boostrom I, Portal EAR, Spiller OB, Walsh TR, Sands K. Comparing long-read assemblers to explore the potential of a sustainable low-cost, low-infrastructure approach to sequence antimicrobial resistant bacteria with oxford nanopore sequencing. Front Microbiol 2022; 13:796465 [View Article] [PubMed]
    [Google Scholar]
  16. Johnson J, Soehnlen M, Blankenship HM. Long read genome assemblers struggle with small plasmids. Microb Genom 2023; 9:mgen001024 [View Article] [PubMed]
    [Google Scholar]
  17. Wick RR, Holt KE. Benchmarking of long-read assemblers for prokaryote whole genome sequencing. F1000Res 2019; 8:2138 [View Article] [PubMed]
    [Google Scholar]
  18. Smits THM. The importance of genome sequence quality to microbial comparative genomics. BMC Genomics 2019; 20:1–4 [View Article] [PubMed]
    [Google Scholar]
  19. Wick RR, Judd LM, Gorrie CL, Holt KE. Completing bacterial genome assemblies with multiplex MinION sequencing. Microb Genom 2017; 3:e000132 [View Article] [PubMed]
    [Google Scholar]
  20. Antipov D, Raiko M, Lapidus A, Pevzner PA. Plasmid detection and assembly in genomic and metagenomic data sets. Genome Res 2019; 29:961–968 [View Article] [PubMed]
    [Google Scholar]
  21. Baptista RP, Kissinger JC. Is reliance on an inaccurate genome sequence sabotaging your experiments?. PLoS Pathog 2019; 15:e1007901 [View Article] [PubMed]
    [Google Scholar]
  22. Baptista RP, Reis-Cunha JL, DeBarry JD, Chiari E, Kissinger JC et al. Assembly of highly repetitive genomes using short reads: the genome of discrete typing unit III Trypanosoma cruzi strain 231. Microb Genom 2018; 4:e000156 [View Article] [PubMed]
    [Google Scholar]
  23. Rozov R, Brown Kav A, Bogumil D, Shterzer N, Halperin E et al. Recycler: an algorithm for detecting plasmids from de novo assembly graphs. Bioinformatics 2017; 33:475–482 [View Article] [PubMed]
    [Google Scholar]
  24. Tørresen OK, Star B, Mier P, Andrade-Navarro MA, Bateman A et al. Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases. Nucleic Acids Res 2019; 47:10994–11006 [View Article] [PubMed]
    [Google Scholar]
  25. Koren S, Phillippy AM. One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly. Curr Opin Microbiol 2015; 23:110–120 [View Article] [PubMed]
    [Google Scholar]
  26. Wick RR, Schultz MB, Zobel J, Holt KE. Bandage: interactive visualization of de novo genome assemblies. Bioinformatics 2015; 31:3350–3352 [View Article] [PubMed]
    [Google Scholar]
  27. Thorvaldsdóttir H, Robinson JT, Mesirov JP. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform 2013; 14:178–192 [View Article] [PubMed]
    [Google Scholar]
  28. Andrews S. FastQC: a quality control tool for high throughput sequence data. In Babraham Bioinformatics Cambridge, United Kingdom: Babraham Institute; 2010
    [Google Scholar]
  29. Krueger F. Trim galore. A wrapper tool around cutadapt and fastqc to consistently apply quality and adapter trimming to fastq files. Babraham Bioinform 2015; 516:517
    [Google Scholar]
  30. De Coster W, D’Hert S, Schultz DT, Cruts M, Van Broeckhoven C. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics 2018; 34:2666–2669 [View Article] [PubMed]
    [Google Scholar]
  31. Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 2015; 31:3210–3212 [View Article] [PubMed]
    [Google Scholar]
  32. Blankenberg D, Gordon A, Von Kuster G, Coraor N, Taylor J et al. Manipulation of FASTQ data with galaxy. Bioinformatics 2010; 26:1783–1785 [View Article] [PubMed]
    [Google Scholar]
  33. Ferdows MS, Barbour AG. Megabase-sized linear DNA in the bacterium Borrelia burgdorferi, the Lyme disease agent. Proc Natl Acad Sci U S A 1989; 86:5969–5973 [View Article] [PubMed]
    [Google Scholar]
  34. Lin YS, Kieser HM, Hopwood DA, Chen CW. The chromosomal DNA of Streptomyces lividans 66 is linear. Mol Microbiol 1993; 10:923–933 [View Article] [PubMed]
    [Google Scholar]
  35. Stewart P, Rosa PA, Tilly K. Linear Plasmids in bacteria: common origins, uncommon ends. Plas Biol 2004291–301 [View Article]
    [Google Scholar]
  36. Li H. Aligning sequence reads, clone sequences and assembly Contigs with BWA-MEM. arXiv 2013 [View Article]
    [Google Scholar]
  37. Li H, Durbin R. Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics 2009; 25:1754–1760 [View Article] [PubMed]
    [Google Scholar]
  38. Antipov D, Hartwick N, Shen M, Raiko M, Lapidus A et al. plasmidSPAdes: assembling plasmids from whole genome sequencing data. Bioinformatics 2016; 32:3380–3387 [View Article] [PubMed]
    [Google Scholar]
  39. Arredondo-Alonso S, Willems RJ, van Schaik W, Schürch AC. On the (im)possibility of reconstructing plasmids from whole-genome short-read sequencing data. Microb Genom 2017; 3:e000128 [View Article] [PubMed]
    [Google Scholar]
  40. Acman M, van Dorp L, Santini JM, Balloux F. Large-scale network analysis captures biological features of bacterial plasmids. Nat Commun 2020; 11:2452 [View Article] [PubMed]
    [Google Scholar]
  41. Schäfer L, Volk F, Kleespies RG, Jehle JA, Wennmann JT. Elucidating the genomic history of commercially used Bacillus thuringiensis subsp. tenebrionis strain NB176. Front Cell Infect Microbiol 2023; 13:209 [View Article]
    [Google Scholar]
/content/journal/mgen/10.1099/mgen.0.001173
Loading
/content/journal/mgen/10.1099/mgen.0.001173
Loading

Data & Media loading...

Supplements

Supplementary material 1

PDF
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error