Abstract:
|
Tree ensembles are among the most popular and successful machine learning models due to their high prediction accuracy. Their shortcomings lie in difficulty of interpretation and drawing insights. While contributing tree models are easy to interpret, this transparency is lost when the tree models are combined into an ensemble. In this article, we describe a method to detect useful decision rules from a given tree ensemble. We exploit the fact that a tree ensemble offers a very large pool of interpretable decision rules. These decision rules can be used as basis for discovery of direct insights into important relationships supported by the ensemble. Novel metrics are proposed to select the top decision rules that are both the most interesting and consistent with the ensemble predictions. Interestingness that we consider is high prediction accuracy for categorical targets and high difference from the overall average for continuous targets. Consistency is defined in terms of predictions generated by a decision rule and the ensemble. We demonstrate the effectiveness of our approach through an example.
|