Abstract:
|
Feature selection from big data in a regression analysis is always a challenge. One popular assumption to feature selection in large regression data is that the random errors have a homoscedastic variance. In this study, we present a Subsampling Winner Algorithm (SWA) for feature selection in large regression data, when the errors are heteroscedastic and the variances need to be estimated. The idea of SWA is analogous to the selection of national merit scholars, and is capable of handling linear regression data of any dimension in principle. Parametric and nonparametric methods are used to estimate the weights. We also compare our procedure with the benchmark procedures such as Elastic Net, SCAD, MCP and Random Forest.
|