1887

Abstract

Increasingly rich metadata are now being linked to samples that have been whole-genome sequenced. However, much of this information is ignored. This is because linking this metadata to genes, or regions of the genome, usually relies on knowing the gene sequence(s) responsible for the particular trait being measured and looking for its presence or absence in that genome. Examples of this would be the spread of antimicrobial resistance genes carried on mobile genetic elements (MGEs). However, although it is possible to routinely identify the resistance gene, identifying the unknown MGE upon which it is carried can be much more difficult if the starting point is short-read whole-genome sequence data. The reason for this is that MGEs are often full of repeats and so assemble poorly, leading to fragmented consensus sequences. Since mobile DNA, which can carry many clinically and ecologically important genes, has a different evolutionary history from the host, its distribution across the host population will, by definition, be independent of the host phylogeny. It is possible to use this phenomenon in a genome-wide association study to identify both the genes associated with the specific trait and also the DNA linked to that gene, for example the flanking sequence of the plasmid vector on which it is encoded, which follows the same patterns of distribution as the marker gene/sequence itself. We present PlasmidTron, which utilizes the phenotypic data normally available in bacterial population studies, such as antibiograms, virulence factors, or geographical information, to identify traits that are likely to be present on DNA that can randomly reassort across defined bacterial populations. It is also possible to use this methodology to associate unknown genes/sequences (e.g. plasmid backbones) with a specific molecular signature or marker (e.g. resistance gene presence or absence) using PlasmidTron. PlasmidTron uses a k-mer-based approach to identify reads associated with a phylogenetically unlinked phenotype. These reads are then assembled de novo to produce contigs in a fast and scalable-to-large manner. PlasmidTron is written in Python 3 and is available under the open source licence GNU GPL3 from https://github.com/sanger-pathogens/plasmidtron.

Loading

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.000164
2018-03-12
2024-03-28
Loading full text...

Full text loading...

/deliver/fulltext/mgen/4/3/mgen000164.html?itemId=/content/journal/mgen/10.1099/mgen.0.000164&mimeType=html&fmt=ahah

References

  1. Arredondo-Alonso S, Willems RJ, van Schaik W, Schürch AC. On the (im)possibility of reconstructing plasmids from whole-genome short-read sequencing data. Microb Genom 2017; 3:e000128 [View Article][PubMed]
    [Google Scholar]
  2. Antipov D, Hartwick N, Shen M, Raiko M, Lapidus A et al. plasmidSPAdes: assembling plasmids from whole genome sequencing data. Bioinformatics 2016; 32:3380–3387 [View Article][PubMed]
    [Google Scholar]
  3. Wick RR, Judd LM, Gorrie CL, Holt KE. Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput Biol 2017; 13:e1005595 [View Article][PubMed]
    [Google Scholar]
  4. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 2012; 19:455–477 [View Article][PubMed]
    [Google Scholar]
  5. Rozov R, Brown Kav A, Bogumil D, Shterzer N, Halperin E et al. Recycler: an algorithm for detecting plasmids from de novo assembly graphs. Bioinformatics 2017; 33:475–482 [View Article][PubMed]
    [Google Scholar]
  6. Lanza VF, de Toro M, Garcillán-Barcia MP, Mora A, Blanco J et al. Plasmid flux in Escherichia coli ST131 sublineages, analyzed by plasmid constellation network (PLACNET), a new method for plasmid reconstruction from whole genome sequences. PLoS Genet 2014; 10:e1004766 [View Article][PubMed]
    [Google Scholar]
  7. Laczny CC, Galata V, Plum A, Posch AE, Keller A. Assessing the heterogeneity of in silico plasmid predictions based on whole-genome-sequenced clinical isolates. Brief Bioinform 2017 [View Article][PubMed]
    [Google Scholar]
  8. Deorowicz S, Kokot M, Grabowski S, Debudaj-Grabysz A. KMC 2: fast and resource-frugal k-mer counting. Bioinformatics 2015; 31:1569–1576 [View Article][PubMed]
    [Google Scholar]
  9. Kokot M, Dlugosz M, Deorowicz S. KMC 3: counting and manipulating k-mer statistics. Bioinformatics 2017; 33:2759–2761 [View Article][PubMed]
    [Google Scholar]
  10. Tange O. GNU parallel – the command-line power tool. Login USENIX Mag 2011; 36:42–47
    [Google Scholar]
  11. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. ArXiv Q-Bio 20131303.3997
    [Google Scholar]
  12. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J et al. The sequence alignment/map format and SAMtools. Bioinformatics 2009; 25:2078–2079 [View Article][PubMed]
    [Google Scholar]
  13. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J et al. BLAST+: architecture and applications. BMC Bioinformatics 2009; 10:421 [View Article][PubMed]
    [Google Scholar]
  14. Wong VK, Baker S, Connor TR, Pickard D, Page AJ et al. An extended genotyping framework for Salmonella enterica serovar Typhi, the cause of human typhoid. Nat Commun 2016; 7:12827 [View Article][PubMed]
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.000164
Loading
/content/journal/mgen/10.1099/mgen.0.000164
Loading

Data & Media loading...

Supplements

Supplementary File 3

PDF

Supplementary File 4

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error