Full text loading...
Abstract
The large amount of data from inexpensive sequencing means that the number of putative biosynthetic gene clusters (BGCs) far exceeds our ability to experimentally characterize them. This necessitates the need for development of further tools to analyze putative BGCs to flag those of interest for further characterization. clusterTools implements a framework to aid in the in silico characterization of BGCs by identifying regions of the DNA, containing homologous proteins, or coding sequences containing specific functional domain compositions using user-built HMM rules, in close proximity, reporting results in an easy to visualize manner. clusterTools complements existing software for BGC analysis in two ways. First, by running clusterTools on databases of genomic sequences in an exploratory mode, the user can identify and download regions of interest in the DNA for further processing and annotation in programs such as antiSMASH. Second, if clusterTools is run on databases constructed from putative gene clusters generated by antiSMASH, one can rapidly identify clusters on interest from the group that warrant further analysis and experimental characterization. We demonstrate the use of clusterTools as part of our workflow to identify BGCs of specific classes of natural products that would be difficult to identify with existing methods, particularly clusters containing assembly line domains as components, including those involved in bacterial polyketide alkaloid biosynthesis. clusterTools can also be used to identify novel BGCs by incorporating regulatory and antibiotic resistance elements. Standalone versions of clusterTools are available for Macintosh, Windows, and Linux.
- Published Online: