Many of the best classifiers are ensemble methods such as bagging, random forests, boosting, and Bayes model averaging. We give conditions under which each of these four classifiers can be regarded as a Bayes classifier. We also give conditions under which stacking achieves the minimal Bayes risk.
We compare the four classifiers with a logistic regression classifier to assess the cost of interpretability. First we characterize the increase in risk from using an ensemble method in a logistic classifier versus using it directly. Second, we characterize the change in risk from applying logistic regression to an ensemble method versus using the logistic classifier itself. Third, we give necessary and sufficient conditions for the logistic classier to be worse than combining the logistic classier and the Bayes classifier. Hence the result extends to ensemble classifiers that are asymptotically Bayes.
|