Conference Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 139 - Improving Population Inference Using Statistical Data Integration
Type: Topic Contributed
Date/Time: Monday, August 8, 2022 : 10:30 AM to 12:20 PM
Sponsor: Survey Research Methods Section
Abstract #323140
Title: Integrating Probability and Non-Probability Samples Through Machine Learning Based Methods
Author(s): Sixia Chen and Chao Xu* and James Cutler
Companies: University of Oklahoma Health Sciences Center and University of Oklahoma Health Sciences Center and University of Oklahoma Health Sciences Center
Keywords: non-probability samples ; probability samples ; machine learning ; deep learning; generalized additive modeling
Abstract:

Although probability samples have been regarded as the gold-standard to collect information for population-based study, non-probability samples have been used frequently in practice due to low cost, convenience, and the difficulties for creating the sampling frames. Naïve estimates based on non-probability samples without any adjustments may be misleading due to the selection bias. Recently, valid data integration approach including mass imputation, propensity score weighting, and calibration has been used to improve the representativeness of non-probability samples. However, the effectiveness of mass imputation approach depends on the underlying model assumption. In this paper, we propose and compare several modern machine learning (ML) based mass imputation approaches including generalized additive modeling (GAM), regression tree, random forest, XG-boosting, Support vector machine, and deep learning. We evaluate our proposed methods in terms of relative bias, relative standard error, and relative root mean squared error, by using both simulation study and real application. ML based method outperformed GAM when there are non-linear correlations in the data.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2022 program