HOME: A HISTOGRAM BASED MACHINE LEARNING APPROACH TO IDENTIFY DIFFRENTIALLY METHYLATED REGIONS IN A GENOME (#164)
A major challenge in DNA methylome analysis is the accurate identification of differentially methylated regions (DMRs) in the genome. Localized changes in DNA methylation are observed in disease, stresses and development and are often associated with functionally important regions of the genome, including promoters and enhancers. Sensitive and specific identification of regions of genome that exhibit differentially methylated states between different conditions in large group of samples requires efficient and accurate algorithms. Although various tools have been proposed to tackle this challenging problem, they are still largely limited in precision and accuracy. We overcome the limitations of current methods such as inaccurate DMR boundary detection, high volume of spurious DMRs, and lack of DMR identification in time series data by the use of supervised machine learning techniques. We have developed novel Histogram Of MEthylation (HOME) based features that exploit the inherent difference in methylation between DMRs and non-DMRs to robustly discriminate between the two. These features are used to train linear Support Vector Machine (SVM) classifier to identify DMRs accurately from aligned whole genome bisulfite sequencing data. HOME can identify highly accurate DMRs among any number of groups in the sample with or without replicates, and it can also identify DMRs in the time series data, unlike other existing methods to the best of our knowledge. We also demonstrate that HOME produces more precise DMR boundaries than existing methods. We have performed rigorous testing of our algorithm against the current state-of-the-art methods and the results show that HOME outperforms them on both simulated and real-world biological datasets.