Online Program Home
My Program

Abstract Details

Activity Number: 357 - Issues in Survey Design and Estimation
Type: Contributed
Date/Time: Tuesday, July 31, 2018 : 10:30 AM to 12:20 PM
Sponsor: Survey Research Methods Section
Abstract #329298
Title: A Comparison of Clustering Algorithms Used for Multivariate Stratification of Primary Sampling Units
Author(s): Thomas Chesnut* and Padraic Murphy
Companies: U.S. Census Bureau and U.S. Census Bureau
Keywords: clustering; integer programming; greedy algorithm; multi-stage sampling; stratification

For many demographic surveys it conducts, the Census Bureau uses a two-stage sample design, where the Primary Sampling Units, or PSUs, are counties or groups of counties and the second-stage sampling units are households selected from within the sampled PSUs. To reduce sampling variance in the first stage, we stratify the PSUs. Beginning with the 1980 sample redesign, the Census Bureau has used a method of PSU stratification based on a clustering algorithm described in a 1967 article by Friedman and Rubin. The Friedman-Rubin algorithm has been described as a ``greedy hill-climbing heuristic.' Alternatively, in the 2010 sample redesign, the Consumer Expenditure Survey used an approach based on $k$-means clustering and an iterative application of constrained integer programming optimization. We refer to this alternative approach as the ``King method" after the primary author of an article in the proceedings of the 2011 Joint Statistical Meetings describing the approach. This paper attempts to compare the two methods by creating stratifications with each method for five different surveys and comparing the results using two alternative evaluation metrics.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program