https://doi.org/10.1038/s41467-021-20911-3
Excited to finally share our new article featuring some of the main findings from my PhD thesis about de novo gene birth in baker’s yeast. It was just published today in @NatureComms !
Tweetorial to follow

@UPFBarcelona @the_prbb @imimat @chusdonlo
Excited to finally share our new article featuring some of the main findings from my PhD thesis about de novo gene birth in baker’s yeast. It was just published today in @NatureComms !
Tweetorial to follow


@UPFBarcelona @the_prbb @imimat @chusdonlo
What makes de novo genes so special? Their origin story: it’s rags-to-riches evolutionary tale!
Most new genes are just spin-offs of other genes (e.g. duplication, fusion, etc.). but de novo genes start “from scratch” from previously non-coding sequences.
2/10
Most new genes are just spin-offs of other genes (e.g. duplication, fusion, etc.). but de novo genes start “from scratch” from previously non-coding sequences.
2/10
Over time, these non-coding sequences can gain new promoters, new ORFs, and potentially become full-fledged proteins with novel functions.
In fact, novo genes have already been discovered in dozens of species including fruit fly
, rice
, and even us humans
.
3/10
In fact, novo genes have already been discovered in dozens of species including fruit fly



3/10
The vast majority of the human genome is non-coding, which again, is the raw material necessary for de novo gene birth.
By contrast, the majority of the baker’s yeast genome is coding (70%). How can young de novo genes get started in a genome with such little raw material?
4/10
By contrast, the majority of the baker’s yeast genome is coding (70%). How can young de novo genes get started in a genome with such little raw material?
4/10
As we wanted to study the early stages of de novo gene birth, we couldn’t rely on the reference annotations. We designed an experiment with RNAseq for 11 different species of yeast + ribosome profiling for baker’s yeast, for two conditions:
1) rich media
2) oxidative stress
5/10
1) rich media
2) oxidative stress
5/10
The RNAseq data allowed us to study both annotated & unannotated transcripts in baker’s yeast; we then checked if they were conserved in other closely-related species.
We also identified which ORFs were being actively translated using our ribosome profiling data.
6/10
We also identified which ORFs were being actively translated using our ribosome profiling data.
6/10
We classified a subset of 213 taxonomically-restricted transcripts as de novo, many of which contain translated ORFs (97/213). These peptides were translated for the first time ever at some point over the last ~20 million years! 
7/10

7/10
As has been described in some other studies, we observed that our set of translated de novo peptides were significantly shorter, had lower coding scores, and had higher isoelectric points than conserved transcripts (which are mostly annotated genes).
8/10
8/10
Surprisingly, we found that half of de novo transcripts were overlapping another transcript on the other strand (105/213); this configuration is much rarer for more conserved transcripts. Coding sequences with antisense overlap = interesting evolutionary dynamics! 
9/10

9/10
There also appeared to be a correlation in the transcriptional regulation of de novo transcripts and their overlapping genes in response to changes in environmental conditions (oxidative stress in our case). This can affect the evolution of de novo genes in compact genomes.
10/10
10/10
These discoveries (and many more) are all thanks to my PhD supervisors Mar Alba @maralbasoler and Lucas Carey @LucasBCarey, as well as all of the coauthors, colleagues, and collaborators who made this project possible!