Abstract:
|
For many demographic surveys it conducts, the Census Bureau uses a two-stage sample design, where the Primary Sampling Units, or PSUs, are counties or groups of counties and the second-stage sampling units are households selected from within the sampled PSUs. To reduce sampling variance in the first stage, we stratify the PSUs. Beginning with the 1980 sample redesign, the Census Bureau has used a method of PSU stratification based on a clustering algorithm described in a 1967 article by Friedman and Rubin. The Friedman-Rubin algorithm has been described as a ``greedy hill-climbing heuristic.' Alternatively, in the 2010 sample redesign, the Consumer Expenditure Survey used an approach based on $k$-means clustering and an iterative application of constrained integer programming optimization. We refer to this alternative approach as the ``King method" after the primary author of an article in the proceedings of the 2011 Joint Statistical Meetings describing the approach. This paper attempts to compare the two methods by creating stratifications with each method for five different surveys and comparing the results using two alternative evaluation metrics.
|