582 – Stratification and Other Survey Sampling Theory
A Hierarchical Clustering Algorithm for Multivariate Stratification in Stratified Sampling
Jaekwang Kim
Iowa State University
Sarah Nusser
Iowa State University
Stephanie Zimmer
Strati�cation is used in sampling to create homogeneous groups. A number of methods have been proposed for strati�cation of populations using covariates of the variable of interest. These include Dalenius and Hodges’ (1959) cumulative root frequency method, the Lavallee and Hidiroglou (1988) algorithm, and the Gunning and Horgan (2004) geometric strati�cation method. All of these methods assume you have one variable of interest and one correlated auxiliary variable known for the population. Many surveys have more than one important variable of interest as well as many auxiliary variables. The method we propose considers multiple variables of interest. We use a superpopulation model to create a distance metric between elements in the population that depends on multiple auxiliary variables. Using the proposed metric, a hierarchical clustering algorithm can be used to implement the optimal strati�cation automatically by combining elements into strata that are closest together. Our method is motivated by the NASS June Area Survey (JAS), where we have multiple auxiliary variables to stratify sample segments and want to make estimates for several crop and livestock parameters