The Recycled Human Genome : Ubiquitous neufunctionalisation of pseudogenes via LINE1 mediated intron generation — ASN Events

The Recycled Human Genome : Ubiquitous neufunctionalisation of pseudogenes via LINE1 mediated intron generation (#112)

Daniel W Thomson 1 , Xiu-Cheng Quek 1 , Nenad Bartonicek 1 , Marcel E Dinger 1
  1. Garvan Institute of Medical Research, Darlinghurst, NSW, Australia

DNA copies of messenger RNA (mRNA) called processed pseuodgenes comprise ~24% of annotated genomic elements in human. They have sequence homology to an annotated gene, but as products of retrotransposition they lack introns and promoter elements, compromising their transcription. Using RNA CaptureSeq in human brain to provide an unprecedented depth of analysis of pseudogene transcription, we observe transcription at 69% of annotated pseudogene loci where previously only 9% of pseuodgenes were annotated as transcribed (Gencode 2012). Aided by sequencing depth, we identify >800 pseudogenes that are expressed as spliced exons or 3’UTRs of adjacent genes. Unexpectedly, the majority (86%) of RNA sequencing reads at pseudogene loci is in antisense.

We uncover a novel mechanism by which retrotransposition facilitates intron formation to generate gene isoforms incorporating processed pseudogenes. When transcribed in antisense, the hallmark 3’ poly(A) tail derived from mRNA reverse transcription resembles the 5’ poly(T) (polypyrimadine) tract, a requisite of the canonical splicing machinery. The splicing branch point and 3’ splice site is co-opted from the target site that is duplicated upon retrotransposon insertion. Previous reports have linked DNA transposons to intron generation in model eukaryotes (Huff, Zilberman et al. 2016, Nature)(Cavalier-Smith 1985, Nature), however the impact of retrotransposition in generating fusion genes in the human genome has gone unnoticed and a mechanism has been lacking. Overall, this work provides the first evidence linking retrotransposition to intron gain, and gene formation in the human genome. Via this mechanism we demonstrate that hundreds of processed pseudogenes have protein coding function, and hundreds of protein coding genes have novel isoforms incorporating pseudogenes. 

#LorneGenome