Online Program

Return to main conference page
Friday, May 18
Survey Data
Fri, May 18, 3:00 PM - 3:45 PM
Regency Ballroom B
 

Performance Evaluation of Machine Learning Algorithms by K-Fold and Leave-One-Out Cross Validation for Classification of Survey Write-in Responses (304482)

Presentation

*Andrea Roberson, U.S. Census Bureau 

Keywords: U.S. Census Bureau, machine learning, predictive models

The Annual Capital Expenditures Survey (ACES) provides detailed and timely information on capital investment in structures and equipment by nonfarm businesses during the year. The data are used to improve the quality of economic indicators of business investments, as well as estimates of gross domestic product. Studies conducted by the U.S. Census Bureau have assessed procedures and targeted areas for Economic Directorate survey processing improvement. Recent research has shown machine learning is an effective technique to reduce the workload of analysts who review and edit write-in responses. SABLE (Scraping Assisted By LEarning) is a tool developed by the U.S. Census Bureau that will classify the “Other” category into multi-label descriptions: Structures, Equipment and Not Applicable. This tool will deploy a logistic regression model into a production system for the ACES 2018 survey year. Performance metrics such as classification accuracy are essential for assessment of the utility of this classifier. Both k-fold and leave-one-out cross validation are preferred methods for the evaluation of classification algorithm performance. This work aims to compare the two validation schemes, in the context of building effective machine learning models using text-based data. The aim of this work is to gain insights from our data to inform best practices for choosing methods of cross-validation.