672 – Hypothesis Testing, Matching, and Coding
Creating an Automated Industry and Occupation-Coding Process for the American Community Survey
Michael Kornbau
U.S. Census Bureau
Matthew Thompson
U.S. Census Bureau
Julie Vesely
U.S. Census Bureau
Every year the American Community Survey (ACS) collects data on millions of individuals. In particular, data is collected on the industry and occupation in which individuals work. This data, however, is collected in the form of write-ins. In order to produce estimates using this data, the industry and occupation write-ins must be assigned 4-digit codes indicating a specific industry or occupation. The coding of industry and occupation for the ACS is a massive operation. Every year over 2 million industry and occupation write-ins are assigned census codes, and this number continues to grow. Each of these cases is reviewed by a clerk and assigned a code. To reduce costs, a process was developed to assign industry and occupation codes using the write-in fields and a logistic regression model was created to determine the best code.