Keywords: Neural Networks, Machine Learning, Official Statistics
The Survey of Household Spending (SHS) collects household expenditure data via a computer-assisted personal interview and a 1-week diary of expenditures in which respondents have the option to provide shopping receipts instead of written items transcriptions. The shopping receipts are then sent to headquarters where they are scanned into an electronic format. The current process to extract relevant information from imaged receipts is manual. Coders observe the scanned receipts and manually capture information such as the items bought, the price, the store names, the total of the purchases, the date of purchase, etc. This process generates a high burden on the agency in terms of time and budget allocated to the capture of over 30,000 shopping receipts for every SHS production cycle. This paper gives an overview of a recent project which evaluated the feasibility of automating parts of the receipt information capture process using a machine learning algorithm. More specifically, convolutional neural networks and a training set of captured SHS receipts were used to create a strategy to automatically extract and classify store logos from shopping receipts from specific retailers. The results of this small-scale feasibility project are promising and open the door to the possibility of automating other parts of the receipt capture.