Abstract:
|
Bayesian Additive Regression Trees (BART) is a popular prediction method for modeling phenomena in the physical and engineering sciences that exhibit complex non-linearities. Compared to other prediction methods such as neural networks, BART has demonstrated competitive predictive accuracy and more transparent underlying statistical models. A key feature of BART useful in the study of complex processes is the ability to identify important variables. A commonly-used heuristic for measuring variable importance in BART is to count the number of nodes in the ensemble that split on each variable. The more frequently a variable is split on, the greater its presumed importance. This method, though seemingly crude, is easy to interpret and compute. In this poster we explore the relationship between variables identified as important by count methods and their variance-based main effect sensitivities. We show that the two methods are intrinsically connected and that the former can be thought of as a “blind version” of the latter invariant to certain properties of the ensemble. We then assess whether this count method accurately measures variable importance or if a new heuristic is needed.
|