Abstract:
|
A major challenge in current statistics is variable selection. Even though many authors have proposed methods in the literature for the past years, in a context where the number of variables vastly exceeds the number of observations or in a highly correlated framework, their performances are generally limited in recall and precision.
We improve the performance of existing classification models, for instance, regression-based ones, using correlated resampling techniques. Taking into account this correlation structure is the fundamental strength of our approach, which allows us to select reliable variables in parsimonious or non-parsimonious classification problems. For example, we have succeeded in increasing the performance of glmnet logistic models, variational approximation methods for a binary response, spls discriminant analysis models and sparse generalized pls models. In addition, we can compute a confidence index based on the resamplings that helps assess the stability of each of the variables that the model may select.
We demonstrated the performance increase due to our method by using a comprehensive simulation benchmark based on simulated and real data sets.
|