Online Program

Return to main conference page
Friday, May 18
Computing Science
Distinguished Students of Edward Wegman
Fri, May 18, 1:30 PM - 3:00 PM
Grand Ballroom D
 

Modeling Topics in Survey Interviewer Notes (304355)

Presentation

*Wendy Martinez, U.S. Bureau of Labor Statistics 
Terrance Savitsky, Bureau of Labor Statistics 

Keywords: Hierarchical Dirichlet process, document clustering, Bayesian models

Government surveys of households and establishments typically include inputs collected from interviewer notes that provide a rich source of context and information. We propose to extract themes from the collection of interviewer notes (our documents) by employing a scalable optimization method based on non-parametric mixtures of hierarchical Dirichlet processes that allows discovery of multiple local, by document, themes linked to a set of global themes. Survey data are typically acquired under an informative sampling design where the probability of inclusion depends on the surveyed response, such that the distribution for the observed sample is different from the population. We use a pseudo-posterior with sampling weights that differentially weights the contributions of the document likelihoods to “undo" the informative design, such that we estimate the distribution of themes with respect to the population of establishments or households from which our sample was drawn. The method is applied to the Consumer Expenditure Survey conducted by the Bureau of Labor Statistics.