Abstract:
|
The U.S. Census Bureau classifies business establishments according to the North American Industry Classification System (NAICS). NAICS groups establishments into industries based on the activities in which they are primarily engaged. The Census Bureau uses NAICS for many purposes such as stratifying establishments for sample selection and tailoring survey questionnaires to respondents. To assign NAICS codes to establishments, the Census Bureau uses information from different sources such as the Economic Census, the Internal Revenue Service, and the Social Security Administration. Aspects of NAICS coding can be manually intensive, expensive, and time consuming and can introduce systematic errors that are difficult to diagnose. Assigning codes in a more automated way using models can address these disadvantages. In this paper, we review NAICS autocoding efforts and explore machine learning and text classification methods for assigning NAICS codes using business description write-in responses to the Economic Census. Models are trained on write-ins from the 2012 Economic Census and applied to write-ins from the 2017 Economic Census. We also discuss associated concerns and challenges.
|