Abstract:
|
Many current variable importance (VI) methods are not comparable across model types, or can give seemingly incoherent results when multiple prediction models fit the data well. We propose a framework of VI measures for describing how much any model class, any model-fitting algorithm, or any individual prediction model, relies on covariate(s) of interest. The building block of our approach, Model Reliance (MR), compares a prediction model's expected loss with that model's expected loss on a pair of observations in which the value of the covariate of interest has been switched. Expanding on MR, we propose Model Class Reliance (MCR) as the upper and lower bounds on the degree to which any well-performing prediction model within a class may rely on a variable of interest. We give probabilistic bounds for MR and MCR, using existing results for U-statistics. We also illustrate connections between MR, conditional causal effects, and linear regression coefficients. We then apply MR & MCR to study the behavior of recidivism prediction models, using a public dataset of Broward County criminal records.
|