Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 208 - Survey Estimation
Type: Contributed
Date/Time: Tuesday, August 4, 2020 : 10:00 AM to 2:00 PM
Sponsor: Government Statistics Section
Abstract #312393
Title: Assessing the Quality of a Coding Process Generated by a Machine Learning Algorithm
Author(s): Richard Laroche* and Pier-Olivier Tremblay
Companies: Statistics Canada and Statistics Canada
Keywords: automated coding; machine learning; quality
Abstract:

The Retail Commodity Survey (RCS) collects detailed information about retail commodity sales in Canada. The objective is to produce estimates of the sales of various commodities, at the national level, for 12 retail subsectors in Canada. The RCS uses the North American Product Classification System (NAPCS) to classify commodities. Statistics Canada now receives scanner data from some major Canadian retailers. These scanner data files are received on a daily or weekly basis and contain information about products and sales. However, information about the NAPCS is not available on these scanner data files. An automated coding approach was developed using machine learning techniques to assign a NAPCS code to all the product descriptions found on the scanner data files. In order to assess the performance of the automated coding, a quality framework was developed. Different strategies were put in place, going from basic checks when a new scanner data file is received to the manual coding of a sample of products. This will allow the evaluation of the model overtime, especially as new products appear. Based on this evaluation, the model will be improved if required.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program