Activity Number:
|
660
- Shrinkage Methods for Analyzing Complex Business Data
|
Type:
|
Topic Contributed
|
Date/Time:
|
Thursday, August 2, 2018 : 10:30 AM to 12:20 PM
|
Sponsor:
|
Business and Economic Statistics Section
|
Abstract #329222
|
|
Title:
|
High-Dimensional Variable Selection When Features Are Sparse
|
Author(s):
|
Jacob Bien* and Xiaohan Yan
|
Companies:
|
University of Southern California and Cornell University
|
Keywords:
|
lasso;
feature selection;
high-dimensional;
sparse;
rare;
text data
|
Abstract:
|
It is common in modern prediction problems for many of the features to be counts of rarely occurring events. This leads to design matrices in which a large number of columns are highly sparse. The challenge posed by such "rare features" has received little attention despite its prevalence. We show, both theoretically and empirically, that not explicitly accounting for the rareness of features can greatly reduce the effectiveness of an analysis. We next propose a framework for aggregating rare features into denser features in a flexible manner that creates better predictors of the response. We apply our method to Trip Advisor data.
|
Authors who are presenting talks have a * after their name.