Understanding the expression of human disease genes using FANTOM5 expression profiles. (#205)
Whole exome and whole genome sequencing have become powerful tools in diagnosing human genetic disease. Even so, sequencing often identifies more potentially disease causing coding sequence variants than can feasibly be verified by functional studies. In answer to this, gene prioritisation techniques attempt to distinguish the causal variant by prioritising genes that are more likely – based on their known interactions, function, and/or expression – to be responsible for the observed disease phenotype.
Gene expression differs dramatically between different tissues and cells of the body. Previous studies have found that gene expression within healthy individuals can be used in gene prioritisation, but the relationship between tissue/cell specific expression and disease is not yet fully understood. Large projects such as FANTOM5 (Functional Annotation of the Mammalian Genome) and GTEx (Genotype-Tissue Expression) have measured gene expression across a wide range of tissues and cells, yet these data sets are not routinely used in gene prioritisation. As such, there is considerable room for improvement in understanding the relationship between gene expression and disease and in improving our ability to prioritise candidate disease genes.
In this work, FANTOM5 gene expression data were used to explore the relationship between tissue/cell specific gene expression and disease. This has identified both high expression and tissue specific expression as important characteristics when prioritising human disease genes. Marked differences in the success of prioritising genes associated with dominant and recessive disease were noted, suggesting gene prioritisation should be tailored to suit the suspected mode of inheritance. Furthermore, we assessed a diverse range of tissues/cell expression profiles, identifying the most informative sample sets in gene prioritisation for different disease characteristics. The results of our work can be used directly to assist in the diagnosis of genetic disease, and can also be applied to further improve gene prioritisation tools and pipelines.