Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 355 - Modern Model Selection
Type: Contributed
Date/Time: Wednesday, August 5, 2020 : 10:00 AM to 2:00 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #312275
Title: Controlling Costs: Feature Selection on a Budget
Author(s): Guo Yu* and Daniela Witten and Jacob Bien
Companies: University of Washington and University of Washington and University of Southern California
Keywords: weighted false discovery proportion; cost; feature selection; multiple testing; knockoff filter
Abstract:

The traditional framework for feature selection treats all features as costing the same amount. However, in reality, a scientist often has considerable discretion regarding what variables to measure, and the decision involves a tradeoff between model accuracy and cost (where cost can refer to money, time, difficulty, or intrusiveness). In particular, unnecessarily including an expensive feature in a model is worse than unnecessarily including a cheap feature. We propose a procedure, based on multiple knockoffs, for performing feature selection in a cost-conscious manner. The key idea behind our method is to force higher cost features to compete with more knockoffs than cheaper features. We derive an upper bound on the weighted false discovery proportion associated with this procedure, which corresponds to the fraction of the feature cost that is wasted on unimportant features. We prove that this bound holds simultaneously with high probability over a path of selected variable sets of increasing size. In a simulation study, we investigate the practical importance of incorporating cost considerations into the feature selection process.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program