Online Program Home
  My Program

Abstract Details

Activity Number: 184 - SPEED: Variable Selection and Networks
Type: Contributed
Date/Time: Monday, July 31, 2017 : 11:35 AM to 12:20 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #325268
Title: Assessing Variable Importance Nonparametrically Using Machine Learning Techniques
Author(s): Brian Williamson* and Marco Carone and Noah Simon and Peter Gilbert
Companies: University of Washington and University of Washington Department of Biostatistics and University of Washington and Fred Hutchinson Cancer Research Center
Keywords: variable importance ; machine learning ; targeted learning ; nonparametric statistics ; statistical inference
Abstract:

It is often of interest in a regression problem to measure the "importance" of each feature in predicting the response. Classically, variable importance methods make a trade-off between flexibility and inference; either the method works only for parametric models and allows inference, or the method allows for flexible estimation procedures but does not allow inference and generally does not have well understood asymptotic properties. We propose an extension of ANOVA that can be applied with general complex machine-learning-based prediction methods to flexibly estimate the additional proportion of the total variability in the outcome explained by a single feature or group of features. Using the tools of targeted learning, we show that under some conditions, we get efficient estimates of variable importance with asymptotically valid confidence intervals, while fitting any flexible estimation procedure. We demonstrate the performance of this ANOVA extension in the context of a study of the median house price in the Boston area and in a study of risk factors for cardiovascular disease in South Africa.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2017 program

 
 
Copyright © American Statistical Association