Full text loading...
Abstract
Over the past decade, we have gained considerable insight into the identification of sequence variation within the rDNA array of Saccharomyces cerevisiae and its closest wild relative, Saccharomyces paradoxus. Yet considerable challenges remain in the computational characterisation of this complex genomic region. This study aimed to evaluate the use of variation graphs for this purpose, formally comparing their effectiveness with traditional linear approaches.
Specifically, we aimed to identify both partial and fixed variants (i.e. pSNPs, SNPs, pINDELs and INDELs) in the rDNA arrays of 10 diverse, haploid Saccharomyces cerevisiae strains with high quality genomic datasets. We constructed two computational pipelines using two highly different approaches. The first pipeline used the BWA read mapper and the BCFtools variant caller to identify variants against the linear S288c reference, with the second pipeline using the vg tool to call variants against a graphical reference (either based on a graphical representation of the S288c genome or a Saccharomyces cerevisiae pan-genome).
The results showed that the graph-based pipeline was able to identify more variants than the linear pipeline, and in particular partial variants, while also missing some key variants identified by BWA/BCFtools. A major discrepancy between the two pipelines was found in the read coverage at loci where the vg pipeline identified variants. In the coming months, we aim to investigate the cause of these differences and to develop a new graph-based computational pipeline that can accurately identify the full range of sequence and copy number variation within this key genomic region.
- Published Online: