Online Program Home
My Program

Abstract Details

Activity Number: 192 - Contributed Poster Presentations:SSC
Type: Contributed
Date/Time: Monday, July 29, 2019 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistics in Genomics and Genetics
Abstract #306659
Title: Feature Selection Bias in Assessing the Predictivity of SNPs for Alzheimer's Disease
Author(s): Mei Dong* and Longhai Li
Companies: University of Saskatchewan and University of Saskatchewan
Keywords: feature selection bias; cross validation ; predictive analysis; lasso; GWAS; Alzheimer's disease
Abstract:

In the context of identifying related SNPs for a phenotype of interest, we consider assessing the predictivity of SNPs selected by performing GWAS. Internal cross-validation (ICV) is that a subset of SNPs are pre-selected based on all samples causing upwardly bias. External cross-validation (ECV) is to re-select features based on only the training samples such that the feature selection is external to test samples. The feature selection bias of ICV has not received sufficient attention when predicting with SNP data. We demonstrate that ICV can lead to severe false discovery using Alzheimer's disease. We use a real SNP dataset and two synthetic datasets. For the prediction, we compare the performances of three regularized logistic regression methods. For the LOAD dataset, no other SNPs can improve the prediction of LOAD using ECV expect APOE. however, the predictivity estimate of selected SNPs given by ICV can reach an R^2 of 80%. The result of synthetic datasets are similar to the real data. We found Hyper-LASSO performs better than LASSO and elastic net. We recommend that ICV should not be used to measure the predictivity of selected SNPs and this statement should be made clearly.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program