Abstract:
|
Sure independence screening (SIS) is a well-known procedure in linear model feature selection for high and ultrahigh dimensional data based on the Pearson correlation (Fan and Lv, 2008). Yet, it is mainly focused on linear model with continuous response variable. In this paper, we considered extending the SIS method to high-dimensional quadratic generalized linear model with binary response variable. We developed an SIS procedure based on the Point-Biserial correlation, with the ability to find the correlation between a continuous variable and a binary random variable. The Point-Biserial sure independence screening (PB – SIS) can be implemented in a straightforward way as the original SIS procedure, but it works more specifically on high-dimensional generalized linear model with interaction terms. We established the sure screening property for PB-SIS methods and conducted simulation studies to evaluate its performance. The simulation studies showed that PB-SIS performs better that some other traditional generalized linear model selection methods. We also demonstrated its use with the application to a real data example.
|