An atlas of human long non-coding RNAs reveals their heterogeneity and evidence of their widespread function (#206)
Long non-coding RNAs (lncRNAs) are largely heterogeneous and functionally uncharacterized. Using FANTOM5 Cap Analysis of Gene Expression (CAGE) data we integrated multiple transcript collections to generate a comprehensive catalog of 27,919 high-confidence 5’complete human lncRNA genes and their expression profiles across 1,829 samples from the major human primary cell-types and tissues. Genomic and epigenomic classifications of these lncRNAs revealed that most intergenic lncRNAs are derived from enhancer-like regions rather than from promoters. Incorporating genetic and expression data we showed that trait-associated SNPs enriched at lncRNA loci were specifically expressed in the relevant cell-types, suggesting their roles in numerous diseases. We further demonstrated that thousands of lncRNA loci overlapping eQTL-associated SNPs were co-expressed with the mRNA target of the eQTL. Combining these findings with conservation data allowed us to identify 19,175 potentially functional lncRNA loci in human. The functional evidence and annotations of our lncRNA catalog are summarized and available as a web resource.