Abstract:
|
Ultrahigh dimensional data analysis brings unprecedented opportunities and big challenges for statistical modeling. In whole genome-wide association studies, hundreds of thousands of genetic markers but limited sample size is a common issue. In order to figure out which predictors determine the response such as human disease or biological traits, variable selection skills are needed to interrogate each predictor individually. The traditional distance correlation approach was able to rank the predictors based on their relative importance, however it can not give a threshold and can not judge how many predictors are important. In this talk, I will present a Backward Elimination Distance Correlation approach to detect the most important predictors that are strongly associated with the response. This newly proposed approach can work for both categorical and continuous response and predictors without much restrictions on model specification. Additionally, it can adaptively pick out a suggested number of important predictors by minimizing the mean squared prediction error. The power of our approach has been verified through both simulated and real data.
|
ASA Meetings Department
732 North Washington Street, Alexandria, VA 22314
(703) 684-1221 • meetings@amstat.org
Copyright © American Statistical Association.