Online Program Home
My Program

Abstract Details

Activity Number: 463 - Novel Uses of Text Analysis in Government Agencies
Type: Topic Contributed
Date/Time: Wednesday, August 1, 2018 : 8:30 AM to 10:20 AM
Sponsor: Government Statistics Section
Abstract #327275 Presentation
Title: Identifying Misclassifications in Consumer Expenditure Data
Author(s): Clayton Knappenberger*
Companies: U.S. Bureau of Labor Statistics
Keywords: text analysis; classification error; machine learning

Classification error occurs when a survey response is recorded as being in an incorrect category. This is a common and well-studied problem in survey data. Correlation of classification error with other explanatory variables of interest can cause bias in estimates. Typically researchers have relied on subsequent re-interviews or on administrative data to estimate the presence and extent of classification error. This report presents a novel approach to estimating and addressing classification error in the Bureau of Labor Statistics' Consumer Expenditure Quarterly Interview data (CEQ). I use respondent provided text descriptions of purchases to train a predictive model whose output is a predicted item category for each expenditure. This prediction is then compared to the reported category to identify likely cases of item misclassification. I estimate a classification error of approximately 6% for a single expenditure category, and generate new expenditure estimates for that category after correcting for identified misclassifications.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program