Conference Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 500 - Invited Papers: Journal of Statistical Analysis and Data Mining
Type: Invited
Date/Time: Thursday, August 11, 2022 : 8:30 AM to 10:20 AM
Sponsor: Journal on Statistical Analysis and Data Mining
Abstract #320526
Title: Subsampling Winner Algorithm from the Feature Space for Feature Selection in Large Regression Data: A Paradigm Shift
Author(s): Jiayang Sun* and Yiying Richard Fan
Companies: George Mason University and Cleveland State University
Keywords: Subsampling; Feature Selection; FDR; Algorithm; high dimensional data; Regression
Abstract:

Feature selection from a large number p of covariates in a regression analysis challenges data science, especially for scaling to ever-enlarging data and finding scientifically important features. The modern approach to feature selection in large-p data uses a penalized likelihood or a shrinkage estimation, such as a LASSO, SCAD, Elastic Net, or MCP procedure. The randomForest procedure is another alternative. We present a different approach using a new subsampling method, called the Subsampling Winner algorithm (SWA), subsampling from p features (not from n observations). Due to its subsampling nature, SWA can scale to data of any dimension in principle. SWA has the best-controlled false discovery rate in comparison with the aforementioned procedures while having a competitive true feature discovery rate, in a linear regression setting. We investigate the reasons behind its good performance, provide practical strategies to double assure an SWA selection, and discuss its extension to a more general setting. We shall also discuss computational improvements and SWA's relation with some machine learning algorithms.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2022 program