Short-read draft whole genome assemblies can contain many contigs and be impacted on by repeat regions, such as those caused by mobile element activity or inherently repetitive gene structure. Annotating such assemblies for gene content and functional activity can be challenging. This can be especially true if the predicted genes are fragmented across contigs, very large, repetitive or of unusual nucleotide content. Very high and low %GC genomes also come with additional issues. The Pfam domain database 1 is a widely-used large collection of protein families, each represented by multiple sequence alignments and Hidden Markov Models (HMMs). Rather than studying the predicted whole gene content of draft genomes, or presence/absence of specific genes in pan-core genome analyses, we examined predicted protein content by Pfam domain complement. Here we present Punchline, a workflow written in Python 3, to study the genetic content of pangenome assemblies including draft assemblies by looking at the complement of short protein domains. The domains can be used in statistical comparisons of Bacterial groups of interest as provided to the workflow of Punchline. In addition, we show the application of Punchline to specific genomic data.

1. The Pfam protein families database in 2019: S. El-Gebali, J. Mistry, A. Bateman, S.R. Eddy, A. Luciani, S.C. Potter, M. Qureshi, L.J. Richardson, G.A. Salazar, A. Smart, E.L.L. Sonnhammer, L. Hirsh, L. Paladin, D. Piovesan, S.C.E. Tosatto, R.D. Finn Nucleic Acids Research (2019) doi: 10.1093/nar/gky995

  • This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Article metrics loading...

Loading full text...

Full text loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error