![IconGems-Print](images/IconGems-Print.png)
361 – Applications of Ensemble and Tree-Based Methods
An Approach to the Multivariate Two-Sample Problem Using Classification and Regression Trees and Minimum-Weight Spanning Subgraphs
David M. Ruth
United Sates Naval Academy
Samuel E. Buttrey
Naval Postgraduate School
Lyn R. Whitaker
Naval Postgraduate School
The multivariate two-sample problem is one of continued interest in statistics. Approaches to this problem normally require a dissimilarity measure on the observation sample space; such measures are typically restricted to numeric variables. In order to accommodate both categorical and numeric variables, we use a new dissimilarity measure based on a set of classification and regression trees. We briefly discuss this new measure and then incorporate it into in a recently developed graph-based multivariate test. The test statistic counts the number of intergroup edges in a minimum-weight regular spanning subgraph; unequal distributions will tend to result in fewer edges in this count. Test performance is examined via simulation study, and test efficacy investigated using real-world data.