Abstract:
|
The traditional formula of DNA methylation aging is based on linear models, relatively few works are on neural networks, which tends to have the advantage of learning more complex relationships from the data. However, DNA methylation data usually is high-dimensional and this introduces the problem of overfitting which leads to a poor generalization of the neural network model. In this paper, we propose a neural network model called Correlation Pre-Filtered Neural Network (CPFNN). CPFNN uses Spearman Correlation to pre-filter the input features before feeding them into a neural network. We compare CPFNN with the Statistical Regression models (e.g, Horvath and Hannum’s formula), the Basic Neural Network and the Dropout Neural Network. CPFNN outperforms these models by at least 1 year in term of mean average error (MAE), with a MAE of 2.7 years. We also test for association between the epigenetic age using CPFNN with Schizophrenia and Down syndrome (p=0.024 and < 0.001, respectively). We discover that for a large number of candidate features, a key factor in improving prediction accuracy is how to appropriately weight features that are highly correlated with the outcome of interest.
|