- Volume 1, Issue 1, 2015
Volume 1, Issue 1, 2015
- Research Paper
-
- Genomic Methodologies
- Data clustering methods
-
-
K-Pax2: Bayesian identification of cluster-defining amino acid positions in large sequence datasets
More LessThe recent growth in publicly available sequence data has introduced new opportunities for studying microbial evolution and spread. Because the pace of sequence accumulation tends to exceed the pace of experimental studies of protein function and the roles of individual amino acids, statistical tools to identify meaningful patterns in protein diversity are essential. Large sequence alignments from fast-evolving micro-organisms are particularly challenging to dissect using standard tools from phylogenetics and multivariate statistics because biologically relevant functional signals are easily masked by neutral variation and noise. To meet this need, a novel computational method is introduced that is easily executed in parallel using a cluster environment and can handle thousands of sequences with minimal subjective input from the user. The usefulness of this kind of machine learning is demonstrated by applying it to nearly 5000 haemagglutinin sequences of influenza A/H3N2.Antigenic and 3D structural mapping of the results show that the method can recover the major jumps in antigenic phenotype that occurred between 1968 and 2013 and identify specific amino acids associated with these changes. The method is expected to provide a useful tool to uncover patterns of protein evolution.
-
- Systems Microbiology
- Transcriptomics, proteomics, networks
-
-
Expanded roles of leucine-responsive regulatory protein in transcription regulation of the Escherichia coli genome: Genomic SELEX screening of the regulation targets
More LessLeucine-responsive regulatory protein (Lrp) is a transcriptional regulator for the genes involved in transport, biosynthesis and catabolism of amino acids in Escherichia coli. In order to identify the whole set of genes under the direct control of Lrp, we performed Genomic SELEX screening and identified a total of 314 Lrp-binding sites on the E. coli genome. As a result, the regulation target of Lrp was predicted to expand from the hitherto identified genes for amino acid metabolism to a set of novel target genes for utilization of amino acids for protein synthesis, including tRNAs, aminoacyl-tRNA synthases and rRNAs. Northern blot analysis indicated alteration of mRNA levels for at least some novel targets, including the aminoacyl-tRNA synthetase genes. Phenotype MicroArray of the lrp mutant indicated significant alteration in utilization of amino acids and peptides, whilst metabolome analysis showed variations in the concentration of amino acids in the lrp mutant. From these two datasets we realized a reverse correlation between amino acid levels and cell growth rate: fast-growing cells contain low-level amino acids, whilst a high level of amino acids exists in slow-growing cells. Taken together, we propose that Lrp is a global regulator of transcription of a large number of the genes involved in not only amino acid transport and metabolism, but also amino acid utilization.
-