Abstract:
|
Genome-wide association studies (GWASs) traditionally concern the relations between genetic variants known as SNPs and traits such as diseases. However, there has recently been a shift in focus towards clinical translation and personalized medicine making it relevant to also consider risk prediction approaches. In this study, a GWAS cohort containing ~3000 type 2 diabetes cases and ~3000 non-diabetic controls is analyzed with this shift in mind. A traditional univariate association study as well as risk prediction from traditional logistic regression and the non-linear machine learning algorithm random forest are studied. In contrast to findings of genome-wide significant associations, the predictive performance is not necessarily aided significantly by the information carried in the SNPs. In this talk, we discuss the statistical challenges in transitioning from associations on a population scale to prediction for individuals. This will be done in the context of GWAS data in which factors like a limited number of observations and a large number of variables each with low effect-sizes make individual prediction a challenging problem.
|