Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 321 - Machine Learning and Variable Selection
Type: Contributed
Date/Time: Wednesday, August 11, 2021 : 3:30 PM to 5:20 PM
Sponsor: Section on Statistical Computing
Abstract #318902
Title: Improving Variable Selection in Linear Models Using the Select Boost Algorithm
Author(s): Myriam Maumy* and Frederic Bertrand
Companies: Université de technologie de Troyes and Université de technologie de Troyes
Keywords: Variable selection; Data resampling; Correlated data; Stablility; Precision-selection trade-off; Linear models
Abstract:

With the growth of big data, variable selection has become one of the critical challenges in statistics. Although many methods have been proposed in the literature their performance in terms of recall and precision are limited in a context where the number of variables by far exceeds the number of observations or in a highly correlated setting.

We propose a general algorithm which improves the precision of any existing variable selection method. This algorithm is based on highly intensive simulations and takes into account the correlation structure of the data.

The user of selectBoost can use this algorithm to produce a confidence index or choose an appropriate precision-selection trade-off to select variables with high confidence and avoid selecting non-predictive features. The main idea behind our algorithm is to take into account the correlation structure of the data and thus use intensive computation to select reliable variables.

We succeeded in improving the precision of the lasso selection method with relative stability on recall and F-score and we show the performance of our algorithm on simulated and real data for linear models.

Available as a CRAN R-Package.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program