Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 54 - Record Linkage, Data Integration, and Improving Survey Measurement
Type: Contributed
Date/Time: Sunday, August 8, 2021 : 3:30 PM to 5:20 PM
Sponsor: Survey Research Methods Section
Abstract #318018
Title: Machine Learning-Based Data Integration Procedures for Handling Nonprobability Sample
Author(s): James Cutler* and Sixia Chen and Chao Xu
Companies: University of Oklahoma Health Sciences Center and University of Oklahoma Health Sciences Center and University of Oklahoma Health Sciences Center
Keywords: machine learning; mass imputation; survey methodology; non-probability sample
Abstract:

Nonprobability samples happen frequently in practice including medicine, epidemiology, public opinion, and other research fields. It is well known that naïve estimates based on nonprobability samples may suffer from selection biases. Data integration by combining information from nonprobability samples and probability samples have been shown to be one of the effective ways to handle nonprobability samples. However, the validity of data integration approaches depends on the underlying model assumptions. Modern machine learning approaches including generalized additive modeling, random forest, XGboost, and deep learning have been shown to be somewhat robust against the failure of those model assumptions. In this paper, we compare different machine learning based data integration approaches via simulation study and real application. XGBoost and Deep learning approaches have been shown to outperform other machine learning approaches in terms of balancing bias and variance.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program