Abstract:
|
This paper introduces new tools in the R package PPforest for visualizing projection pursuit classification random forest objects. PPforest is an ensemble learning method. It adapts the classic random forest to utilize combinations of variables, as produced by projection pursuit, in the tree construction. Utilizing linear combinations of variables to separate classes takes the correlation between variables into account, and can outperform the classic forest when separations between groups exist in combinations of variables. PPforest is available on \url{https://github.com/natydasilva/PPforest}. Visualization is useful to help obtain an understanding of the class structure in the data and how the model fits it. Because the PPforest is composed of many tree fits on subsets of the data, a lot of statistics are calculated and this produces essentially a separate data set. Some of the diagnostics of interest are the same as in the classic forest, but calculated differently: variable importance, OOB error rate, vote matrix and proximity matrix. Static, dynamic and interactive plots will be used with this data, linked to the training data, to better understand the fitted PPforest model
|