Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 25 - Modern Techniques in Handling Missing Data with Challenging Data Structures Including Big and Small Data Files
Type: Topic Contributed
Date/Time: Monday, August 3, 2020 : 10:00 AM to 11:50 AM
Sponsor: Survey Research Methods Section
Abstract #312906
Title: Missing Data Analysis Using Machine Learning
Author(s): Chao Xu* and Sixia Chen
Companies: University of Oklahoma Health Science Center and University of Oklahoma Health Sciences Center
Keywords: Missing data; Machine learning; Deep learning; High-dimensional data; Big data; Imputation
Abstract:

The advancement of data collection and storage technology produce big volume data for clinical and basic science research, such as the electronic health/medical records with hundreds and even more variables. As a commonly used data imputation technique, machine-learning methods are promising in dealing with complicated correlations in big data. However, their statistical properties are not well studied, such as the deep learning. It is urgent to have a practical guide for the application of machine learning methods on the missing data analysis. Therefore, we design a comprehensive simulation study of missing data analysis to evaluate the performance of classical statistical methods, high-dimensional model, classical machine-learning methods, and deep learning. In the simulation, we consider low- and high-dimensional data size, linear and non-linear correlations among variables. The imputation bias and variance of the different methods are compared. Our study will provide guidance for investigators wishing to use machine-learning methods for data imputation, and promote more machine-learning based application and theory study.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program