Online Program Home
My Program

Abstract Details

Activity Number: 248 - Machine Learning in Science and Industry
Type: Contributed
Date/Time: Monday, July 29, 2019 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #304220 Presentation
Title: Using Machine Learning to Assign North American Industry Classification System Codes to Establishments Based on Business Description Write-Ins
Author(s): Brian Dumbacher* and Anne Russell
Companies: U.S. Census Bureau and U.S. Census Bureau
Keywords: U.S. Census Bureau; Economic Census; North American Industry Classification System; business establishments; machine learning; text classification

The U.S. Census Bureau classifies business establishments according to the North American Industry Classification System (NAICS). NAICS groups establishments into industries based on the activities in which they are primarily engaged. The Census Bureau uses NAICS for many purposes such as stratifying establishments for sample selection and tailoring survey questionnaires to respondents. To assign NAICS codes to establishments, the Census Bureau uses information from different sources such as the Economic Census, the Internal Revenue Service, and the Social Security Administration. Aspects of NAICS coding can be manually intensive, expensive, and time consuming and can introduce systematic errors that are difficult to diagnose. Assigning codes in a more automated way using models can address these disadvantages. In this paper, we review NAICS autocoding efforts and explore machine learning and text classification methods for assigning NAICS codes using business description write-in responses to the Economic Census. Models are trained on write-ins from the 2012 Economic Census and applied to write-ins from the 2017 Economic Census. We also discuss associated concerns and challenges.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program