Abstract:
|
The proliferation of inexpensive datasets from Internet surveys stimulate interest in statistical techniques for valid inferences from web samples. We consider estimation of population and domain means in the setup where the web sample contains variables of interest and covariates that are shared with an auxiliary random sample. First, we propose an “implicit” logistic regression for estimating parameters of web response propensity model in the two samples setup. The proposed estimator relies on additional information in the form of random sample inclusion probabilities, specified for web sample units. Second, we propose an estimator of population mean, based on the estimated web response propensity. This makes inferences from web samples similar to well established techniques used for observational studies and missing data problems. The proposed approach is tested in simulation and then applied to real data from the National Health Interview Survey (NHIS). Adaptive LASSO-based estimator is modified to accommodate implicit logistic regression and is tested in simulations involving multivariate approximately specified model.
|