Recursive partition algorithms used to build regression tree models require a method for selecting splits based on the values of a set of potential variables. Tests for equality of model parameters between two groups can be used for estimating p-values for potential splits. Permutation tests are distribution free way to conduct these hypothesis tests.
The probability estimates obtained from permutation tests rely on the condition that the observed data is interchangeable among the groups being tested. Though this assumption is easily satisfied for data obtained from a simple random sample or through a controlled study, it is often violated by survey data collected using a complex sample design. In this article, we propose a method for performing a permutation test that accounts for the complex sample design. Tests using a simulated population comparing the performance of the proposed method to permutation tests ignoring the sample design demonstrate that it is necessary to account for certain design features in order to obtain accurate p-value estimates. The method is applied to a regression tree algorithm modeling U.S. consumer expenditure data.
|