Personal Multi-omic Data to Predict Rare Variants
Differences in our individual genomes can have large consequences to the functioning of our cells and ultimately our health, and characterizing the function of genetic variation can help us understand the biology of human disease. We are interested in modeling the impact of human genetic variation by applying machine learning and statistical methods to the analysis of large-scale genomic data.
Predicting the impact of rare genetic variants in the personal genome
Personal genomics promises to give us a window into individual disease predisposition, to enable our clinicians to tailor the best interventions for each of us. However, currently for the majority of genetic variants identified by genome sequencing, we have no reliable estimate of their impact on basic cellular processes in the individual, let alone disease risk. Each individual carries genetic variants that are rare, some never seen in any previous study, and cannot be easily assessed for disease risk through standard association methods. Rare variants may have a significant impact on our health, and we are investigating in machine learning approaches for predicting which of these variants are likely to be harmful, based on integrating diverse personal genomic measurements to evaluate the potential impact of rare variants in personal genomes.
Modeling the effect of genetic variation on the cell
Identifying the influence of genetics on the human cell is a critical step toward understanding its impact on our health. Genetic variation outside of the protein-coding regions of the genome has proven particularly difficult to interpret, but recent large-scale studies of gene expression offer a window into the regulatory impact of non-coding variants on gene expression. We are interested in modeling the effect of regulatory variation on the complete human transcriptome, including how environmental and behavioral risk factors may interact with genetic variants to affect gene regulation. Our work includes statistical methods for predicting the impact of non-coding variation and developing methods for analysis of RNA-sequencing data.
Cellular networks in disease
We are modeling the complex networks describing regulatory relationships among genes, and their role in mediating the impact of genetic variation on cellular and disease phenotypes. We are developing methods for reconstructing transcriptome-wide networks, representing the interconnected regulation of both transcription and alternative splicing and applying our methods in deeply phenotyped cohorts, to understand the the cascading impact from genetic variation to cellular networks to trait networks.