Shetti, a simple tool to parse, manipulate and search large datasets of sequences

Haitham Sobhy

doi:10.1099/mgen.0.000035

Volume 1, Issue 5

Research Article

Open Access

Shetti, a simple tool to parse, manipulate and search large datasets of sequences

Haitham Sobhy¹
View Affiliations Hide Affiliations

Affiliations: ¹ Dalian Institute of Chemical Physics, CAS, Dalian, PR China
Correspondence: Haitham Sobhy ([email protected])
Published: 06 November 2015 https://doi.org/10.1099/mgen.0.000035

Abstract

Parsing and manipulating long and/or multiple protein or gene sequences can be a challenging process for experimental biologists and microbiologists lacking prior knowledge of bioinformatics and programming. Here we present a simple, easy, user-friendly and versatile tool to parse, manipulate and search within large datasets of long and multiple protein or gene sequences. The Shetti tool can be used to search for a sequence, species, protein/gene or pattern/motif. Moreover, it can also be used to construct a universal consensus or molecular signatures for proteins based on their physical characteristics. Shetti is an efficient and fast tool that can deal with large sets of long sequences efficiently. Shetti parses UniProt Knowledgebase and NCBI GenBank flat files and visualizes them as a table.

Received: 18/06/2015
Accepted: 21/09/2015
Published Online: 06/11/2015

Keyword(s): comparative genomics , consensus pattern , functional motif/domain , protein/gene sequences and sequence manipulation

This is an open-access article distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the source is credited.

Erratum

An erratum has been published for this content:

Shetti, a simple tool to parse, manipulate and search large dataset of sequences

Article metrics loading...

/content/journal/mgen/10.1099/mgen.0.000035

2015-11-06

2024-04-19

Full text loading...

/deliver/fulltext/mgen/1/5/mgen000035.html?itemId=/content/journal/mgen/10.1099/mgen.0.000035&mimeType=html&fmt=ahah

References

Anzaldi L. J., Muñoz-Fernández D., Erill I. 2012; BioWord: a sequence manipulation suite for Microsoft Word. BMC Bioinformatics 13:124 [View Article][PubMed]
[Google Scholar]
Benson D. A., Cavanaugh M., Clark K., Karsch-Mizrachi I., Lipman D. J., Ostell J., Sayers E. W. 2013; GenBank. Nucleic Acids Res 41:D1D36–D42 [View Article][PubMed]
[Google Scholar]
Mi T., Merlin J. C., Deverasetty S., Gryk M. R., Bill T. J., Brooks A. W., Lee L. Y., Rathnayake V., Ross C. A., other authors. 2012; Minimotif Miner 3.0: database expansion and significantly improved reduction of false-positive predictions from consensus sequences. Nucleic Acids Res 40:D1D252–D260 [View Article][PubMed]
[Google Scholar]
Okonechnikov K., Golosova O., Fursov M., UGENE team. 2012; Unipro UGENE: a unified bioinformatics toolkit. Bioinformatics 28:1166–1167 [View Article][PubMed]
[Google Scholar]
Olsen L. R., Kudahl U. J., Simon C., Sun J., Schönbach C., Reinherz E. L., Zhang G. L., Brusic V. 2013; BlockLogo: visualization of peptide and sequence motif conservation. J Immunol Methods 400-401:37–44 [View Article][PubMed]
[Google Scholar]
Sobhy H., Colson P. 2012; Gemi: PCR primers prediction from multiple alignments. Comp Funct Genomics 2012:783138 [View Article][PubMed]
[Google Scholar]
Xia X. 2013; DAMBE5: a comprehensive software package for data analysis in molecular biology and evolution. Mol Biol Evol 30:1720–1728 [View Article][PubMed]
[Google Scholar]
Sourceforge. https://sourceforge.net/projects/shetti(2015).

http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.000035

Shetti, a simple tool to parse, manipulate and search large datasets of sequences

M Gen 1, e000035 (2015); https://doi.org/10.1099/mgen.0.000035

/content/journal/mgen/10.1099/mgen.0.000035

Data & Media loading...

Supplements

Volume 1, Issue 5

Research Article

Open Access

Shetti, a simple tool to parse, manipulate and search large datasets of sequences

Abstract

Erratum

Supplementary Data

Most read this month

Most cited Most Cited RSS feed

ResFinder – an open online resource for identification of antimicrobial resistance genes in next-generation sequencing data and prediction of phenotypes from genotypes

Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification

MOB-suite: software tools for clustering, reconstruction and typing of plasmids from draft assemblies

Completing bacterial genome assemblies with multiplex MinION sequencing

SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments

ClermonTyping: an easy-to-use and accurate in silico method for Escherichia genus strain phylotyping

Identification of Klebsiella capsule synthesis loci from whole genome data

Emergence, molecular mechanisms and global spread of carbapenem-resistant Acinetobacter baumannii

chewBBACA: A complete suite for gene-by-gene schema creation and strain identification

Microreact: visualizing and sharing data for genomic epidemiology and phylogeography