Michael A. Beer
The development and diversity of living organisms is encoded in their genomic DNA. The genomic DNA sequence specifies the set of gene products and the regulatory signals which direct the expression of these gene products in the proper cell types, in response to both developmental and environmental needs. The ultimate goal of our research is to decipher the regulatory logic encoded in DNA sequence, and to understand how these regulatory sequences specify gene expression. While significant progress has been made in some well-studied pathways, whole-genome sequence and microarray techniques for measuring the expression of all genes has the potential to enable the systematic elucidation of these mechanisms on a genomic scale. We are currently focused on 1) developing computational tools to identify functional regulatory elements in non-coding DNA, and 2) experimentally testing and characterizing how these elements function.
Computational Identification and Validation of DNA Regulatory Logic
In our computational work, we are using microarray gene expression data, genome-wide location analysis, and whole-genome DNA sequence to systematically identify DNA functional elements and infer combinatorial regulatory logic. We use pattern recognition algorithms to identify over-represented and phylogenetically conserved DNA sequence elements (or putative transcription factor binding sites). We then use a probabilistic Bayesian network to find the most likely functional constraints on the position, spacing, orientation, and combinations of these DNA sequence elements (Fig 1). This methodology has generated a large set of predictions for regulatory interactions, and is in principle applicable to any organism with microarray and genome sequence data.
Fig 1. An example of regulatory constraints identified in genes involved in ribosomal RNA transcription and processing in yeast (Beer and Tavazoie, Cell 2004). Genes with two computationally discovered sequence elements, PAC and RRPE, with positional constraints, are tightly coregulated. In genes containing both elements but not satisfying the positional constraints, the distribution of pairwise correlations is close to random. The distribution of correlations (A), as well as examples of genes that do (B) and do not (C) satisfy the positional constraints, along with their expression patterns, are shown.
In our experimental work, we are testing these computational predictions by rapid generating of transgenic GFP reporter strains in the rapidly growing nematode C. elegans via microparticle bombardment. C. elegans is intermediate in genomic complexity, at about 1/30th the size of the human genome, with a comparable number of genes, aiding the identification of regulatory elements. In addition, C. elegans is an attractive model system because most C. elegans genes and regulatory pathways are conserved in humans, the genomic sequence is of high quality, and many powerful genetic tools are available.