Conference Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 393 - NLP and Text Analysis
Type: Contributed
Date/Time: Wednesday, August 10, 2022 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistical Learning and Data Science
Abstract #322073
Title: Industry Self-Classification in the Economic Census
Author(s): Brian Dumbacher* and Daniel Whitehead
Companies: U.S. Census Bureau and U.S. Census Bureau
Keywords: Economic Census; hierarchical modeling; information retrieval; machine learning; NAICS; text classification
Abstract:

This paper describes the methodology behind BEACON – a tool that will be used by respondents to the 2022 Economic Census to self-designate their establishment’s North American Industry Classification System (NAICS) code. BEACON, which stands for Business Establishment Automated Classification of NAICS, takes a respondent-provided business description as input and returns to the respondent a list of candidate NAICS codes from which to choose. BEACON is based on text analysis, machine learning, and information retrieval. The rich training dataset for BEACON contains over 3.7 million observations from sources such as past Economic Census responses and Internal Revenue Service data. It is shown how BEACON employs ensemble and hierarchical modeling techniques to propose relevant NAICS codes. This paper also discusses results from a recent Economic Census field test.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2022 program