Name: 2018 Joint Statistical Meetings
Start: 2018-07-28T07:00:00+00:00
End: 2018-08-02
Location: Vancouver Convention Centre

Abstract Details

Activity Number:	463 - Novel Uses of Text Analysis in Government Agencies
Type:	Topic Contributed
Date/Time:	Wednesday, August 1, 2018 : 8:30 AM to 10:20 AM
Sponsor:	Government Statistics Section
Abstract #327275	Presentation
Title:	Identifying Misclassifications in Consumer Expenditure Data
Author(s):	Clayton Knappenberger*
Companies:	U.S. Bureau of Labor Statistics
Keywords:	text analysis; classification error; machine learning
Abstract:	Classification error occurs when a survey response is recorded as being in an incorrect category. This is a common and well-studied problem in survey data. Correlation of classification error with other explanatory variables of interest can cause bias in estimates. Typically researchers have relied on subsequent re-interviews or on administrative data to estimate the presence and extent of classification error. This report presents a novel approach to estimating and addressing classification error in the Bureau of Labor Statistics' Consumer Expenditure Quarterly Interview data (CEQ). I use respondent provided text descriptions of purchases to train a predictive model whose output is a predicted item category for each expenditure. This prediction is then compared to the reported category to identify likely cases of item misclassification. I estimate a classification error of approximately 6% for a single expenditure category, and generate new expenditure estimates for that category after correcting for identified misclassifications.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program

JSM 2018 Online Program

Abstract Details

American Statistical Association