Abstract:
|
Tree-based methods have become one of the most flexible, intuitive, and powerful data analytic tools for exploring complex data structures. The best documented, and arguably most popular uses of tree-based methods are in biomedical research, where multivariate outcomes occur commonly (e.g. diastolic and systolic blood pressure, periodontal measures in dental health studies, and nerve density measures in studies of neuropathy). Existing tree-based methods for multivariate outcomes do not appropriately take into account the correlation that exists in the outcomes. In this paper, we develop three goodness of split measures for multivariate tree building for continuous outcomes. The proposed measures utilize specific aspects of the variance-covariance matrix at each potential split. Additionally, to enhance prediction accuracy we extend the single multivariate regression tree to an ensemble of trees. Extensive simulations are presented to examine the properties of our goodness of fit measures. Finally, the proposed methods are illustrated using two clinical datasets of neuropathy and pediatric cardiac surgery.
|