Proto-genes and de novo Gene Birth
Genes are traditionally viewed as the utmost informative part of the genome, in part because their sequences exhibit clear signatures of natural selection and tend to be conserved across species. In contrast, most non-genic sequences evolve rapidly in a seemingly random fashion and are thus perceived to be of negligible value, often referred to as "intergenic", "junk" or the "dark matter" of the genome. However, conceptual boundaries between genes and non-genes have become blurred over this past decade. The first RNA-sequencing studies, performed in yeast in 2008, revealed that non-genic sequences are widely transcribed. A few years later, I and others showed that non-genic sequences are pervasively translated in yeast. Due to the rapid evolution of non-genic sequences, the peptides resulting from these translation events tend to lack sequence similarity with canonical, conserved proteins: their primary sequences are instead unique and species-specific. I think that the loci encoding these peptides are neither genes nor non-genes, but instead "proto-genes". Proto-genes are transitory genetic entities that can evolve into novel species-specific genes, or return to a non-genic, untranslated state. The union of peptides encoded by proto-genes, or "proto-peptides," constitute the cell's "proto-proteome". Recent evidence suggests that many plant and animal species, including humans, have proto-genes and proto-proteomes. Research in my laboratory seeks to understand:
(1) What are the mechanisms and dynamics of proto-gene evolution?
(2) What are the physiological implications of the proto-proteome?
(3) What is the impact of proto-genes and de novo gene birth on medicine?