All Times ET
Program is Subject to Change
TBD
Industry Classification Using Machine Learning: An Application to the Economic Census (308052)
*Anne Sigda Russell, U.S. Census BureauJavier Miranda, US Census Bureau
Justin Clifford Smith, US Census Bureau
Keywords: Machine Learning, Natural Language Processing, Industry Classification, NAICS, Census Bureau, establishments, firms, classifier
The US Census Bureau spends considerable time and resources identifying and classifying the industry of establishments in the U.S using the North American Industrial Classification System (NAICS). This information is critical to Census Bureau statistical products. The burden of this collection to businesses is considerable as well. We have developed a natural language processing and machine learning pipeline that uses a novel combination of proprietary and public data for the purpose of predicting complete NAICS codes. We focus on how this new approach can be implemented into the typical workflow for analysts, how it can be used to automate the easy-to-solve cases leaving analysts with more resources to tackle difficult to classify establishments, and how it could be used to eliminate entire survey operations while maintaining, and conceivably, improving data quality while lowering costs.