|
![]() |
|
|
Computational Biology of Gene Expression
Overview
Research Summary
A typical human primary transcript is about 30 kilobases long and contains several exons separated by much larger and more variably sized introns. The discrepancy between human exon and intron lengths led to the “exon definition” model of splicing in which splice sites are first paired across exons, with spliceosome assembly proceeding through subsequent pairing of exon units. In the alternative “intron definition” model, splice sites are initially paired across introns rather than exons. Intron definition is thought to be the predominant mode of splicing in transcripts containing short introns and long exons. We have analyzed sequence features involved in recognition of short introns using available transcript data from five eukaryotes with complete or nearly complete genomic sequences. The information content of five different transcript features was measured using methods from information theory, and Monte Carlo simulations were used to determine the amount of information required for accurate rocognition of short introns in each organism. We found that short introns in Drosophila melanogaster and Caenorhabditis elegans contain essentially all of the information for their recognition by the splicing machinery, and computer programs which simulate splicing specificity can predict the exact boundaries of approximately 95% of short introns in both organisms. In yeast the 5’ss, branch signal and 3’ss can accurately identify intron locations but do not precisely determine the location of 3’ cleavage in every intron. The 5’ss, branch signal and 3’ss are clearly not sufficient to accurately identify short inrons in plant and human transcripts, but specific subsets of short, intron-biased motifs can be identified in both human and Arabidopsis, which contribute dramatically to the accuracy of splicing simulators, suggesting that intronic splicing enhancers play a large role in these organisms. It is well established that many exons contain internal sequences which either enhance or repress splicing and that other enhancers and repressors are commonly found in introns. We are developing computational methods for identifying novel splicing enhancer motifs based on the hypothesis that motifs which function as exonic enhancers should have two essential properties: 1) significantly higher frequency in exons than introns; and 2) significantly higher frequency in exons with ‘weak’ (non-consensus) splice signals than in exons with strong consensus-matching splice signals. This screen clearly identifies several known classes of splicing enhancers including purine-rich elements in exons and GGG motifs in introns. Several novel classes of candidate enhancers are also identified. Both known and candidate enhancer motifs tend to be preferentially located at specific distances from splice junctions. The next step is to test the functions of candidate enhancers using in vitro and in vivo splicing assays. A similar approach will be used to screen for intronic enhancers and for splicing repressors. Gene finding: We have recently developed a new algorithm for identifying the locations and exon-intron structures of genes in genomic sequences, GenomeScan. This algorithm is related to our previous Genscan algorithm but achieves higher accuracy by taking into account BLASTX similarity to available proteins. Application of this method to the assembled draft + finished human genome sequence identifies approximately 25,000 human genes which are homologous to known proteins. Adapting GenomeScan for other eukaryotic genomes and using the genes identified with this approach for comparative genomics studies is planned. Alternative splicing: To study the process of alternative splicing, we are constructing databases of alternatively spliced genes and identifying genes which exhibit conserved patterns of alternative splicing between human and mouse and conserved regions in introns flanking the alternatively spliced exons, suggesting the presence of regulation. Experiments are underway to study the regulation and possible function of one particularly interesting alternatively spliced gene, a member of the MAP kinase family. |
|
SCHOLAR NETWORK | NEWS | EVENTS | APPLY | PROGRAM HISTORY
| CONTACT | HOME
Questions about/problems with this site? Please e-mail the webmaster.
|