Abstract:
|
It is well recognized that many machine learning methods perform very well in a variety of applications such as virtual personal assistants and online customer support, and are benefitting people's lives. However, the fact that most machine learning methods do not provide a variable importance measure is usually a barrier that prevents people from interpreting the results. In this talk, we present two types of variable importance measures. Given any specific method, by deleting a variable in the data set or replacing the variable with a constant, CVIL measures the relative difference of the predictive performance of the model from a cross-validation perspective. Under some mild conditions, CVIL is consistent in the sense that it converges to the theoretical variable importance as the sample size grows. Confidence intervals are constructed to show the reliability of the proposed CVIL importance measure. By simulations and real data examples, we show that CVIL provides a rank of variable importance attached to any seemingly uninterpretable predictive algorithm such as random forest.
|