All Times ET
Program is Subject to Change
Modernizing CFS Collection and Operations with Machine Learning (307990)*Christian Moscardi, US Census Bureau
Keywords: commodity flow, transportation, freight, machine learning, AI, GIS, aerial imagery
CFS has implemented two separate machine learning processes to improve data quality as well as reduce operational costs and respondent burden.
First, we have implemented an "autocoder" to classify free-text product descriptions. We have developed several applications of this machine learning technology, saving time and money both for respondents and Census staff while simultaneously improving production data quality and the resulting estimates. In particular, we have imputed or relabeled codes for approximately 200,000 shipment records from the 2017 CFS, improving data quality and saving time. Additionally, we have developed an “AI-Assisted” product code search tool. This tool gives analysts the top 10 most likely product codes according to our model, filtering 514 possible codes down to 10 and saving analysts substantial lookup time. We are also investigating making this tool public based on interest from state-level Departments of Transportation and CFS data users. Last, we plan to deploy this model for respondents in future CFS efforts, which we conservatively estimate will save over $2M of respondent time in its current state.
Second, we have begun exploring using GIS features, aerial imagery, and machine learning to identify areas where shipping activity is likely and unlikely to be occurring, with the goal of better targeting establishments to include in the CFS sample. We will share promising results and applications from the work thus far, including identifying establishments in the 2017 CFS that are likely out-of-scope despite having responded to the survey and reported shipping activity.