Activity Number:
|
463
- Novel Uses of Text Analysis in Government Agencies
|
Type:
|
Topic Contributed
|
Date/Time:
|
Wednesday, August 1, 2018 : 8:30 AM to 10:20 AM
|
Sponsor:
|
Government Statistics Section
|
Abstract #327275
|
Presentation
|
Title:
|
Identifying Misclassifications in Consumer Expenditure Data
|
Author(s):
|
Clayton Knappenberger*
|
Companies:
|
U.S. Bureau of Labor Statistics
|
Keywords:
|
text analysis;
classification error;
machine learning
|
Abstract:
|
Classification error occurs when a survey response is recorded as being in an incorrect category. This is a common and well-studied problem in survey data. Correlation of classification error with other explanatory variables of interest can cause bias in estimates. Typically researchers have relied on subsequent re-interviews or on administrative data to estimate the presence and extent of classification error. This report presents a novel approach to estimating and addressing classification error in the Bureau of Labor Statistics' Consumer Expenditure Quarterly Interview data (CEQ). I use respondent provided text descriptions of purchases to train a predictive model whose output is a predicted item category for each expenditure. This prediction is then compared to the reported category to identify likely cases of item misclassification. I estimate a classification error of approximately 6% for a single expenditure category, and generate new expenditure estimates for that category after correcting for identified misclassifications.
|
Authors who are presenting talks have a * after their name.