Online Program Home
My Program

Abstract Details

Activity Number: 391 - Leveraging Disparate Sources of Data and Machine Learning to Improve Causal Inference
Type: Topic Contributed
Date/Time: Tuesday, July 30, 2019 : 2:00 PM to 3:50 PM
Sponsor: ENAR
Abstract #306655 Presentation
Title: Bayesian Inference for Sample Surveys in the Presence of High-Dimensional Auxiliary Information
Author(s): Yutao Liu* and Andrew Gelman and Qixuan Chen
Companies: Columbia University and Columbia University and Columbia University
Keywords: Bayesian Additive Regression Trees (BARTs); Finite Population Total; High-Dimensional Auxiliary Information; Sample Surveys

Survey inference can be challenged by non-representativeness of survey samples, either imperfect probability samples or non-probability samples without a probability sampling design. We consider improving survey inference with a potentially non-representative survey sample in the presence of high-dimensional auxiliary information, which are measured in the survey sample and also available about the population via such as census data or administrative records. We propose Bayesian model-based predictive methods for estimating finite population totals by modeling the conditional distribution of the survey outcome using Bayesian additive regression trees (BARTs), which naturally handles high-dimensional auxiliary variables allowing possible interactions and nonlinear associations. Besides the auxiliary variables, inspired by Little and An (2004), we estimate the propensity score for a unit to be included in the sample using another BART and also include it as a covariate in the model to achieve robust inference of the population total. We show through simulations studies and a real survey that the Bayesian model-based methods using BARTs improve survey inference.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program