1887

Abstract

Next-generation sequencing (NGS) methods are low-cost high-throughput technologies that produce thousands to millions of sequence reads. Despite the high number of raw sequence reads, their short length, relative to Sanger, PacBio or Nanopore reads, complicates the assembly of genomic repeats. Many genome tools are available, but the assembly of highly repetitive genome sequences using only NGS short reads remains challenging. Genome assembly of organisms responsible for important neglected diseases such as Trypanosoma cruzi, the aetiological agent of Chagas disease, is known to be challenging because of their repetitive nature. Only three of six recognized discrete typing units (DTUs) of the parasite have their draft genomes published and therefore genome evolution analyses in the taxon are limited. In this study, we developed a computational workflow to assemble highly repetitive genomes via a combination of de novo and reference-based assembly strategies to better overcome the intrinsic limitations of each, based on Illumina reads. The highly repetitive genome of the human-infecting parasite T. cruzi 231 strain was used as a test subject. The combined-assembly approach shown in this study benefits from the reference-based assembly ability to resolve highly repetitive sequences and from the de novo capacity to assemble genome-specific regions, improving the quality of the assembly. The acceptable confidence obtained by analyzing our results showed that our combined approach is an attractive option to assemble highly repetitive genomes with NGS short reads. Phylogenomic analysis including the 231 strain, the first representative of DTU III whose genome was sequenced, was also performed and provides new insights into T. cruzi genome evolution.

Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.000156
2018-02-14
2020-01-29
Loading full text...

Full text loading...

/deliver/fulltext/mgen/4/4/mgen000156.html?itemId=/content/journal/mgen/10.1099/mgen.0.000156&mimeType=html&fmt=ahah

References

  1. Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ et al. GenBank. Nucleic Acids Res 2017;45:D37–D42 [CrossRef][PubMed]
    [Google Scholar]
  2. Aslett M, Aurrecoechea C, Berriman M, Brestelli J, Brunk BP et al. TriTrypDB: a functional genomic resource for the Trypanosomatidae. Nucleic Acids Res 2010;38:D457–D462 [CrossRef][PubMed]
    [Google Scholar]
  3. Hall N. Advanced sequencing technologies and their wider impact in microbiology. J Exp Biol 2007;210:1518–1525 [CrossRef][PubMed]
    [Google Scholar]
  4. Alkan C, Sajjadian S, Eichler EE. Limitations of next-generation genome sequence assembly. Nat Methods 2011;8:61–65 [CrossRef][PubMed]
    [Google Scholar]
  5. Bradnam KR, Fass JN, Alexandrov A, Baranay P, Bechner M et al. Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. Gigascience 2013;2:10 [CrossRef][PubMed]
    [Google Scholar]
  6. Johnson PJ, Kooter JM, Borst P. Inactivation of transcription by UV irradiation of T. brucei provides evidence for a multicistronic transcription unit including a VSG gene. Cell 1987;51:273–281 [CrossRef][PubMed]
    [Google Scholar]
  7. Zomerdijk JC, Kieft R, Shiels PG, Borst P. Alpha-amanitin-resistant transcription units in trypanosomes: a comparison of promoter sequences for a VSG gene expression site and for the ribosomal RNA genes. Nucleic Acids Res 1991;19:5153–5158 [CrossRef][PubMed]
    [Google Scholar]
  8. Martínez-Calvillo S, Nguyen D, Stuart K, Myler PJ. Transcription initiation and termination on Leishmania major chromosome 3. Eukaryot Cell 2004;3:506–517 [CrossRef][PubMed]
    [Google Scholar]
  9. Martínez-Calvillo S, Yan S, Nguyen D, Fox M, Stuart K et al. Transcription of Leishmania major Friedlin chromosome 1 initiates in both directions within a single region. Mol Cell 2003;11:1291–1299 [CrossRef][PubMed]
    [Google Scholar]
  10. Rassi A Jr, Rassi A, Marin-Neto JA. Chagas disease. Lancet 2010;375:1388–1402 [CrossRef][PubMed]
    [Google Scholar]
  11. WHO Fact Sheet - Chagas Disease (American Trypanosomiasis) 2017
    [Google Scholar]
  12. Zingales B, Andrade SG, Briones MR, Campbell DA, Chiari E et al. A new consensus for Trypanosoma cruzi intraspecific nomenclature: second revision meeting recommends TcI to TcVI. Mem Inst Oswaldo Cruz 2009;104:1051–1054 [CrossRef][PubMed]
    [Google Scholar]
  13. Miles MA, Llewellyn MS, Lewis MD, Yeo M, Baleela R et al. The molecular epidemiology and phylogeography of Trypanosoma cruzi and parallel research on Leishmania: looking back and to the future. Parasitology 2009;136:1509–1528 [CrossRef][PubMed]
    [Google Scholar]
  14. Zingales B, Miles MA, Campbell DA, Tibayrenc M, Macedo AM et al. The revised Trypanosoma cruzi subspecific nomenclature: rationale, epidemiological relevance and research applications. Infect Genet Evol 2012;12:240–253 [CrossRef][PubMed]
    [Google Scholar]
  15. Zingales B, Miles MA, Moraes CB, Luquetti A, Guhl F et al. Drug discovery for Chagas disease should consider Trypanosoma cruzi strain diversity. Mem Inst Oswaldo Cruz 2014;828–833 [CrossRef][PubMed]
    [Google Scholar]
  16. Zingales B, Pereira ME, Oliveira RP, Almeida KA, Umezawa ES et al. Trypanosoma cruzi genome project: biological characteristics and molecular typing of clone CL Brener. Acta Trop 1997;68:159–173 [CrossRef][PubMed]
    [Google Scholar]
  17. Gaunt MW, Yeo M, Frame IA, Stothard JR, Carrasco HJ et al. Mechanism of genetic exchange in American trypanosomes. Nature 2003;421:936–939 [CrossRef][PubMed]
    [Google Scholar]
  18. de Freitas JM, Augusto-Pinto L, Pimenta JR, Bastos-Rodrigues L, Gonçalves VF et al. Ancestral genomes, sex, and the population structure of Trypanosoma cruzi. PLoS Pathog 2006;2:e24 [CrossRef][PubMed]
    [Google Scholar]
  19. El-Sayed NM, Myler PJ, Bartholomeu DC, Nilsson D, Aggarwal G et al. The genome sequence of Trypanosoma cruzi, etiologic agent of Chagas disease. Science 2005;309:409–415 [CrossRef][PubMed]
    [Google Scholar]
  20. Franzén O, Ochaya S, Sherwood E, Lewis MD, Llewellyn MS et al. Shotgun sequencing analysis of Trypanosoma cruzi I Sylvio X10/1 and comparison with T. cruzi VI CL Brener. PLoS Negl Trop Dis 2011;5:e984 [CrossRef][PubMed]
    [Google Scholar]
  21. El-Sayed NM, Myler PJ, Blandin G, Berriman M, Crabtree J et al. Comparative genomics of trypanosomatid parasitic protozoa. Science 2005;309:404–409 [CrossRef][PubMed]
    [Google Scholar]
  22. Franzén O, Talavera-López C, Ochaya S, Butler CE, Messenger LA et al. Comparative genomic analysis of human infective Trypanosoma cruzi lineages with the bat-restricted subspecies T. cruzi marinkellei. BMC Genomics 2012;13:531 [CrossRef][PubMed]
    [Google Scholar]
  23. Rhoads A, Au KF. PacBio sequencing and its applications. Genomics Proteomics Bioinformatics 2015;13:278–289 [CrossRef][PubMed]
    [Google Scholar]
  24. Loman NJ, Watson M. Successful test launch for nanopore sequencing. Nat Methods 2015;12:303–304 [CrossRef][PubMed]
    [Google Scholar]
  25. Westenberger SJ, Barnabé C, Campbell DA, Sturm NR. Two hybridization events define the population structure of Trypanosoma cruzi. Genetics 2005;171:527–543 [CrossRef][PubMed]
    [Google Scholar]
  26. Burgos JM, Risso MG, Brenière SF, Barnabé C, Campetella O et al. Differential distribution of genes encoding the virulence factor trans-sialidase along Trypanosoma cruzi Discrete typing units. PLoS One 2013;8:e58967 [CrossRef][PubMed]
    [Google Scholar]
  27. Tomasini N, Diosque P. Evolution of Trypanosoma cruzi: clarifying hybridisations, mitochondrial introgressions and phylogenetic relationships between major lineages. Mem Inst Oswaldo Cruz 2015;110:403–413 [CrossRef][PubMed]
    [Google Scholar]
  28. Chiari E, Dias JCP, Lana M, Chiari C. Hemocultures for the parasitological diagnosis of human Chagas disease in the chronic phase. Congresso internacional da doença de Chagas 1979
    [Google Scholar]
  29. Gomes ML, Araujo SM, Chiari E. Trypanosoma cruzi: growth of clones on solid medium using culture and blood forms. Mem Inst Oswaldo Cruz 1991;86:131–132 [CrossRef][PubMed]
    [Google Scholar]
  30. Macedo AM, Martins MS, Chiari E, Pena SD. DNA fingerprinting of Trypanosoma cruzi: a new tool for characterization of strains and clones. Mol Biochem Parasitol 1992;55:147–153 [CrossRef][PubMed]
    [Google Scholar]
  31. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 2014;30:2114–2120 [CrossRef][PubMed]
    [Google Scholar]
  32. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:13033997v1 [q-bioGN] 2013
  33. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J et al. The sequence alignment/map format and SAMtools. Bioinformatics 2009;25:2078–2079 [CrossRef][PubMed]
    [Google Scholar]
  34. Zerbino DR, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 2008;18:821–829 [CrossRef][PubMed]
    [Google Scholar]
  35. Zerbino DR. Using the Velvet de novo assembler for short-read sequencing technologies. Curr Protoc Bioinformatics 2010;Chapter 11:5 Chapter 11, Unit 11.5 [CrossRef][PubMed]
    [Google Scholar]
  36. Tsai IJ, Otto TD, Berriman M. Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps. Genome Biol 2010;11:R41 [CrossRef][PubMed]
    [Google Scholar]
  37. Otto TD, Sanders M, Berriman M, Newbold C. Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology. Bioinformatics 2010;26:1704–1707 [CrossRef][PubMed]
    [Google Scholar]
  38. Alhakami H, Mirebrahim H, Lonardi S. A comparative evaluation of genome assembly reconciliation tools. Genome Biol 2017;18:93 [CrossRef][PubMed]
    [Google Scholar]
  39. Boetzer M, Henkel CV, Jansen HJ, Butler D, Pirovano W. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 2011;27:578–579 [CrossRef][PubMed]
    [Google Scholar]
  40. Otto TD, Dillon GP, Degrave WS, Berriman M. RATT: Rapid Annotation Transfer Tool. Nucleic Acids Res 2011;39:e57 [CrossRef][PubMed]
    [Google Scholar]
  41. Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M et al. Versatile and open software for comparing large genomes. Genome Biol 2004;5:R12 [CrossRef][PubMed]
    [Google Scholar]
  42. Price AL, Jones NC, Pevzner PA. De novo identification of repeat families in large genomes. Bioinformatics 2005;21:i351–i358 [CrossRef][PubMed]
    [Google Scholar]
  43. Smit AFA, Hubley R, Green P. RepeatMasker Open-4.0 2013-2015. www.repeatmasker.org/
  44. Fischer S, Brunk BP, Chen F, Gao X, Harb OS et al. Using OrthoMCL to assign proteins to OrthoMCL-DB groups or to cluster proteomes into new ortholog groups. Curr Protoc Bioinformatics 2011;1–9 Chapter 6:Unit 6 [CrossRef][PubMed]
    [Google Scholar]
  45. Edgar RC. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 2004;5:113 [CrossRef][PubMed]
    [Google Scholar]
  46. Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 2013;30:772–780 [CrossRef][PubMed]
    [Google Scholar]
  47. Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA et al. Clustal W and Clustal X version 2.0. Bioinformatics 2007;23:2947–2948 [CrossRef][PubMed]
    [Google Scholar]
  48. Wallace IM, O'Sullivan O, Higgins DG, Notredame C. M-Coffee: combining multiple sequence alignment methods with T-Coffee. Nucleic Acids Res 2006;34:1692–1699 [CrossRef][PubMed]
    [Google Scholar]
  49. Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 2009;25:1972–1973 [CrossRef][PubMed]
    [Google Scholar]
  50. Capella-Gutierrez S, Kauff F, Gabaldón T. A phylogenomics approach for selecting robust sets of phylogenetic markers. Nucleic Acids Res 2014;42:e54 [CrossRef][PubMed]
    [Google Scholar]
  51. Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 2010;59:307–321 [CrossRef][PubMed]
    [Google Scholar]
  52. Jones DT, Taylor WR, Thornton JM. The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci 1992;8:275–282 [CrossRef][PubMed]
    [Google Scholar]
  53. Darriba D, Taboada GL, Doallo R, Posada D. ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics 2011;27:1164–1165 [CrossRef][PubMed]
    [Google Scholar]
  54. Hordijk W, Gascuel O. Improving the efficiency of SPR moves in phylogenetic tree search methods based on maximum likelihood. Bioinformatics 2005;21:4338–4347 [CrossRef][PubMed]
    [Google Scholar]
  55. Bouckaert R, Heled J, Kühnert D, Vaughan T, Wu CH et al. BEAST 2: a software platform for Bayesian evolutionary analysis. PLoS Comput Biol 2014;10:e1003537 [CrossRef][PubMed]
    [Google Scholar]
  56. Flores-López CA, Machado CA. Analyses of 32 loci clarify phylogenetic relationships among Trypanosoma cruzi lineages and support a single hybridization prior to human contact. PLoS Negl Trop Dis 2011;5:e1272 [CrossRef][PubMed]
    [Google Scholar]
  57. Reis-Cunha JL, Rodrigues-Luiz GF, Valdivia HO, Baptista RP, Mendes TA et al. Chromosomal copy number variation reveals differential levels of genomic plasticity in distinct Trypanosoma cruzi strains. BMC Genomics 2015;16:499 [CrossRef][PubMed]
    [Google Scholar]
  58. Weatherly DB, Boehlke C, Tarleton RL. Chromosome level assembly of the hybrid Trypanosoma cruzi genome. BMC Genomics 2009;10:255 [CrossRef][PubMed]
    [Google Scholar]
  59. Machado CA, Ayala FJ. Nucleotide sequences provide evidence of genetic exchange among distantly related lineages of Trypanosoma cruzi. Proc Natl Acad Sci USA 2001;98:7396–7401 [CrossRef][PubMed]
    [Google Scholar]
  60. Bañuls AL, Dujardin JC, Guerrini F, de Doncker S, Jacquet D et al. Is Leishmania (Viannia) peruviana a distinct species? A MLEE/RAPD evolutionary genetics answer. J Eukaryot Microbiol 2000;47:197–207 [CrossRef][PubMed]
    [Google Scholar]
  61. Valdivia HO, Reis-Cunha JL, Rodrigues-Luiz GF, Baptista RP, Baldeviano GC et al. Comparative genomic analysis of Leishmania (Viannia) peruviana and Leishmania (Viannia) braziliensis. BMC Genomics 2015;16:715 [CrossRef][PubMed]
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.000156
Loading
/content/journal/mgen/10.1099/mgen.0.000156
Loading

Data & Media loading...

Supplements

Supplementary File 1

PDF

Supplementary File 2

PDF

Supplementary File 3

PDF

Supplementary File 4

PDF

Most cited articles

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error