Name: 2022 Joint Statistical Meetings
Start: 2022-08-06T07:00:00+00:00
End: 2022-08-11
Location: Walter E. Washington Convention Center

Conference Program Home
My Program

All Times EDT

Abstract Details

Activity Number:	500 - Invited Papers: Journal of Statistical Analysis and Data Mining
Type:	Invited
Date/Time:	Thursday, August 11, 2022 : 8:30 AM to 10:20 AM
Sponsor:	Journal on Statistical Analysis and Data Mining
Abstract #320526
Title:	Subsampling Winner Algorithm from the Feature Space for Feature Selection in Large Regression Data: A Paradigm Shift
Author(s):	Jiayang Sun* and Yiying Richard Fan
Companies:	George Mason University and Cleveland State University
Keywords:	Subsampling; Feature Selection; FDR; Algorithm; high dimensional data; Regression
Abstract:	Feature selection from a large number p of covariates in a regression analysis challenges data science, especially for scaling to ever-enlarging data and finding scientifically important features. The modern approach to feature selection in large-p data uses a penalized likelihood or a shrinkage estimation, such as a LASSO, SCAD, Elastic Net, or MCP procedure. The randomForest procedure is another alternative. We present a different approach using a new subsampling method, called the Subsampling Winner algorithm (SWA), subsampling from p features (not from n observations). Due to its subsampling nature, SWA can scale to data of any dimension in principle. SWA has the best-controlled false discovery rate in comparison with the aforementioned procedures while having a competitive true feature discovery rate, in a linear regression setting. We investigate the reasons behind its good performance, provide practical strategies to double assure an SWA selection, and discuss its extension to a more general setting. We shall also discuss computational improvements and SWA's relation with some machine learning algorithms.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2022 program

JSM 2022 Conference Program

Abstract Details

American Statistical Association