Abstract:
|
Polygenic risk scores (PRS) are used to quantify the genetic risk associated with certain diseases or phenotypes. There has been recent interest in developing methods to estimate PRS using summary statistic data. We propose a method to estimate PRS via penalized regression using summary statistic data and published reference data. Our method bears similarities to existing method LassoSum, but extends their framework to the Truncated Lasso Penalty (TLP). We show via simulation that the TLP can produce sparser effect size estimates as compared to the LASSO penalty. To facilitate model selection, we propose a method of estimating model ?tting criteria AIC and BIC in the absence of validation data. We additionally propose the so-called quasi-correlation metric, which quanti?es the predictive accuracy of a polygenic risk score on out-of-sample data for which we have only summary statistics. In total, these methods facilitate estimation and model selection of PRS on summary statistic data, and the application of these PRS to out-of-sample summary statistic data. We demonstrate the utility of these methods by applying them to GWAS studies of lung cancer and height, respectively.
|