Abstract:
|
With the widespread availability of Big Data, concerns are raised over finite population inference based on such large-scale non-probability samples. In presence of a benchmark survey with relevant auxiliary variables, one might apply a doubly robust adjustment by combining pseudo-weights with a prediction model to further protect against model misspecification. Traditionally, inverse propensity scores are used as pseudo-weights, but this method lacks adequate justification when auxiliary variables are partially observed. We propose a theoretically valid alternative approach to augment the prediction model in non-probability sample settings. Since the true model is often unknown, and Big Data tend to be poor in such model-relevant covariates, we employ Bayesian additive regression trees, which provide a flexible non-parametric predictive tool. In addition, a bootstrap method is adopted to incorporate the uncertainty in both pseudo-weights and outcome variable into variance estimation. Considering the National Household Travel Survey 2017 as benchmark, we apply our method to improve the generalizability of naturalistic driving data in the Strategic Highway Research Program 2.
|