Abstract:
|
We introduce a decision tree algorithm that can simultaneously reduce tree complexity, improve class prediction, and enhance data visualization. We accomplish this by fitting a bivariate linear discriminant model to the data in each node. Standard algorithms can produce fairly complex tree structures, because they employ a very simple node model, wherein the entire partition associated with a node is assigned to one class. We reduce the size of our trees by letting the discriminant models absorb some of the tree complexity. Being themselves classifiers, the discriminant models can also help to improve prediction accuracy. Our algorithm does not simply fit discriminant models to the terminal nodes of a pruned tree, as this does not reduce the size of the tree. Instead, discriminant modeling is carried out in all phases of tree growth and the misclassification costs of the node models are explicitly used to prune the tree. An extensive empirical study with real data sets shows that in general our algorithm has better prediction power than many other tree or non-tree algorithms.
|