Abstract:
|
In many fields of genetics and genomics, univariate model serves as a workhorse tool to screen massive features, while many of them could be signals. Genome-wide association studies (GWAS) epitomize this type of application: mass-univariate models are fitted separately on millions of genetic variants, and a large number of them have small but nonzero contributions to human complex traits. We study the cross-trait prediction accuracy of marginal estimator in this situation. We model GWAS in a general dense high-dimensional framework and compare marginal estimator to a class of ridge-type conditional estimators, including the popular best linear unbiased prediction (BLUP) in genetics. We show that the relative out-of-sample performance of these estimators highly depends on the dimension/sample size ratio, and reveal that marginal estimator can easily become near-optimal within this class as dimension increases, even though it is an extremely over-regularized special case. In practice, our analysis delivers useful messages for genome-wide polygenic risk prediction and the computational cost and accuracy tradeoff in dense high-dimensions.
|