Abstract:
|
Recent trends in U.S. official statistics are characterized by the rising cost of data collection and in- creased nonresponse. At the same time, inexpensive data from commercial nonrandom web panels have become readily available. This paper discusses the possibility of capitalizing on both sources of information by calibrating data from web panels on estimates from conventional randomized surveys. We treat the probability of being included in a web panel as similar to a response probabil- ity and use propensity score adjustment (PSA) of sampling weights and generalized calibration to produce g-weights, thus potentially making nonrandom samples representative of the general pop- ulation. The simulation study discussed in this paper demonstrates that propensity score model or calibration using covariates correlated with both web sample indicator and target variable can elim- inate bias of estimates from web sample. Variances estimated using a two-phase sampling approach match Monte Carlo variances of point estimators. If the web panel inclusion probability depends on the target variable Y , bias can be removed by using Y as an instrumental variable and calibrat- ing on a closely correlated covariate. However, this approach may lead to a significant increase in variances. Conclusions of the simulation study were validated by applying the methods used in that study to a real web sample representing a subset of the National Health Interview Survey (NHIS) questions. The NHIS public-use file was used as an auxiliary for calculating estimates from the web sample data and for their subsequent validation. The g-weight adjusted estimates from the web and random NHIS samples matched within the limits of statistical significance.
|