Activity Number:
|
508
- Leading the Estimates Towards Known Benchmarks
|
Type:
|
Topic Contributed
|
Date/Time:
|
Wednesday, August 1, 2018 : 10:30 AM to 12:20 PM
|
Sponsor:
|
Survey Research Methods Section
|
Abstract #327253
|
Presentation
|
Title:
|
Calibrating Big Data for Population Inference: Applying Quasi-Randomization Approach to Naturalistic Driving Data Using Bayesian Additive Regression Trees
|
Author(s):
|
Ali Rafei and Michael Elliott* and Carol A.C. Flannagan
|
Companies:
|
University of Michigan and University of Michigan and University of Michigan, Transport Research Institute
|
Keywords:
|
Big Data;
inference;
calibration;
quasi-randomization;
pseudo-weighting;
predictive mean matching
|
Abstract:
|
Big Data are a "big challenge" for population inference as a consequence of unknown selection mechanisms. When data are imbalanced, larger sample size can exacerbate the selection bias problem, relative to sampling variance. One potential approach to mitigate this issue is to treat big data as a nonprobability sample and apply a quasi-randomization approach (QRA) that is used for calibrating nonprobability samples. QRA generates a set of weights by estimating pseudo-inclusion probabilities through a benchmark survey that has a set of auxiliary variables in common with the Big Data. The present study aims to improve the representativeness of naturalistic driving data in Safety Pilot Model Deployment using the benchmark National Household Travel Survey. To address model specification, we employed Bayesian Additive Regression Trees (BART), which provides a flexible predictive tool by incorporating non-linear associations as well as high-order interactions. A modified Jackknife method was utilized to incorporate the variability in both Big Data and benchmark survey into the variance estimates. The simulation results reflect that using BART significantly enhances the performance of QRA.
|
Authors who are presenting talks have a * after their name.