Name: 2019 Joint Statistical Meetings
Start: 2019-07-27T07:00:00+00:00
End: 2019-08-01
Location: Colorado Convention Center

Activity Number:	119 - Statistical Data Editing Modernisation
Type:	Topic Contributed
Date/Time:	Monday, July 29, 2019 : 8:30 AM to 10:20 AM
Sponsor:	Government Statistics Section
Abstract #302979	Presentation
Title:	Improving Efficiency of Imputation Using Machine Learning
Author(s):	Katie Davies* and Vinayak Anand-Kumar
Companies:	Office for National Statistics and Office for National Statistics
Keywords:	Machine Learning; Imputation
Abstract:	Missing data can be problematic as they may reduce the accuracy and reliability of statistics. Imputation creates values and/ or units, that fill in the missingness, in an effort to create a dataset that is more representative of the population and concept of interest. Ideally, imputation methods would be advised by the nature of missingness and, be developed using data available. Unfortunately, imputation models are not always empirically tested due to the large volume of data or timeliness constraints. The Methodology in collaboration with the Data Science Campus investigated the use of supervised Machine Learning (ML) to carry out imputation; using an automated and data driven approach, which would be faster than the current manual/ multi-stage approach. The project used a ML software called XGBoost to directly impute missing values and comparing this to the standard approach. The presentation will cover the key concepts behind XGBoost and the findings from this program of work.

Authors who are presenting talks have a * after their name.

JSM 2019 Online Program