Online Program Home
  My Program

Abstract Details

Activity Number: 414 - Model Building and Selection
Type: Contributed
Date/Time: Tuesday, August 1, 2017 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistical Computing
Abstract #324851 View Presentation
Title: Model-Selection Consistency of Forward Selection in High-Dimensional Regression
Author(s): Jerzy Wieczorek* and Jing Lei
Companies: Carnegie Mellon University and Carnegie Mellon University
Keywords: model selection ; cross validation ; forward selection ; linear regression
Abstract:

Regression variable selection procedures are used widely each day to estimate sparser, more interpretable models in every quantitative field. When analyzing large, high-dimensional datasets, greedy selection algorithms such as Forward Selection (FS) are valued for their low computational costs and their ability to deal with the case of more variables than observations (p > n). We derive sufficient conditions for FS to attain exact recovery of the true model support in the deterministic case, as well as for model selection consistency in the random case. Our conditions allow p to grow with n. For situations where the true model size is not known, we develop a consistent stopping rule based on a sequential variant of Monte Carlo Cross Validation. Finally, for linear models, model-selection-consistent cross validation requires the data-splitting (training/testing) ratio to go to 0 asymptotically. This is not a practical ratio to use with any finite sample. We provide pragmatic suggestions for the data-splitting ratio in large but finite samples, as well as heuristic advice for balancing underfit vs. overfit in small samples.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2017 program

 
 
Copyright © American Statistical Association