Abstract:
|
The primary goal of DNA methylation studies is to find differences in methylation levels between different conditions by identifying differentially methylated regions (DMRs). However, misinterpretations in the readings can arise due to the existence of single nucleotide polymorphisms (SNPs). In this work, it is proposed to leverage the current trend of collecting both SNP and methylation data on the same individual, to enable the originally filtered potential SNPs to be restored. Furthermore, imputation methods are proposed when a SNP is present or other missing data issues arise. First, regularized linear regression imputation models are proposed, along with a variable screening technique to restrict the number of variables. Functional principal component regression imputation is also proposed as an alternative approach. The proposed methods are compared to existing methods and evaluated based on imputation accuracy and DMR detection ability using both real and simulated data. The proposed methods show effective imputation accuracy without sacrificing computation efficiency across a variety of settings, while greatly improving the number of true positive DMR detections.
|