Online Program Home
  My Program

Abstract Details

Activity Number: 519 - Sparse Statistical Learning
Type: Contributed
Date/Time: Wednesday, August 2, 2017 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #322879
Title: Properties of Data-Dependent Variable Selection Methods
Author(s): Sen Tian* and Clifford M. Hurvich and Jeffrey S. Simonoff
Companies: New York University, Stern School of Business and New York University, Stern School of Business and New York University, Stern School of Business
Keywords: Regularization ; Best subset selection ; LASSO ; SCAD
Abstract:

Variable selection plays a crucial rule in modern statistical learning. Regularization methods, which can select variables and estimate their effects simultaneously, have been developed and studied extensively over the last twenty years. These methods involve the selection of a regularization parameter. Most recent theoretical and empirical studies focus on a deterministic choice of regularization parameter; examples include oracle inequalities that describe worst-case predictive accuracy of such methods and the effective degrees of freedom that describes the optimism of such methods, both for a fixed regularization parameter. Unfortunately, in practice the regularization parameter is selected via a data-dependent procedure. In this paper, we study the properties of three data-dependent variable selection methods, best subset selection, LASSO and SCAD. Assuming orthogonal predictors, we investigate the optimism and predictive accuracy of these methods. We demonstrate that there exists a discrepancy between how these methods perform in reality, and perceptions of how these methods perform based on these theoretical results in the literature. Possible solution is discussed as well.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2017 program

 
 
Copyright © American Statistical Association