Online Program Home
My Program

Abstract Details

Activity Number: 608
Type: Contributed
Date/Time: Wednesday, August 3, 2016 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #319485 View Presentation
Title: Subsampling for Feature Selection from Large Regression Data
Author(s): Yiying Fan* and Jiayang Sun
Companies: Cleveland State University and Case Western Reserve University
Keywords: Feature selection ; Subsampling ; big data ; Regression

Feature selection from a large number of features in a regression analysis remains a challenge to data science. We present a new subsampling method, called a Subsampling Winner (SW) algorithm for feature selection in large regression data. The central idea of our approach is analogous to that for the selection of national merit scholars. It uses a `base procedure' on each of subsamples, ranks all features by a scoring algorithm according to the performance of these features in the subsample analyses, then obtains the `semifinalist' based on the resulting scores, and finally determines the `finalists',aka the important features from the `semifinalist'. Due to its subsampling nature, our procedure is applicable to data of any dimension in principle, including data that are too large to use a statistical procedure on the full data by an existing software package. We compare our procedure with other procedures including elastic net and SCAD, and illustrate a SWA's application to a genomic data about Ovarian cancer.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2016 program

Copyright © American Statistical Association