Online Program Home
My Program

Abstract Details

Activity Number: 677 - Variable Selection Methods in Statistical Learning
Type: Contributed
Date/Time: Thursday, August 2, 2018 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #329204 Presentation
Title: Subsampling for Feature Selection in Large Regression Data
Author(s): Yiying Fan*
Companies: Cleveland State University
Keywords: Feature selection; Subsampling; regression

Feature selection has many applications, and is challenging in truly large number of features especially those ever expanding. We present a new approach, called the Subsampling Winner Algorithm (SWA), in large data regression analysis. The central idea of our approach is analogous to that used for the selection of national merit scholars. SWA uses a 'base procedure' on each of the subsamples, computes the scores of all features according to the performance of each feature collected in all subsample analyses, obtains the 'semifinalist' based on the resulting scores, and finally determines the 'finalists', i.e. the most important features, from the 'semifinalist'. We compare SWA with current benchmark procedures using penalized criterion and random forest when features are independent and correlated. We illustrated its application to a genomic data of Ovarian cancer.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program