Abstract:
|
The Retail Commodity Survey (RCS) collects detailed information about retail commodity sales in Canada. The objective is to produce estimates of the sales of various commodities, at the national level, for 12 retail subsectors in Canada. The RCS uses the North American Product Classification System (NAPCS) to classify commodities. Statistics Canada now receives scanner data from some major Canadian retailers. These scanner data files are received on a daily or weekly basis and contain information about products and sales. However, information about the NAPCS is not available on these scanner data files. An automated coding approach was developed using machine learning techniques to assign a NAPCS code to all the product descriptions found on the scanner data files. In order to assess the performance of the automated coding, a quality framework was developed. Different strategies were put in place, going from basic checks when a new scanner data file is received to the manual coding of a sample of products. This will allow the evaluation of the model overtime, especially as new products appear. Based on this evaluation, the model will be improved if required.
|