Abstract:
|
Consider estimation of the population total ???? for an outcome or study variable ?? from a low-budget purposive sample ??~ with the aid of an ongoing high-budget reference probability sample ??* with no data on ?? but data on common auxiliary variables or covariates ??. Using Royall's model-based approach, a prediction estimator can be constructed from ??~ with known totals of ?? or their estimates from ??* under the postulated assumption of model holding for ??~ and its complementary part; i.e., nonselected units. Using Särndal's design-based approach, GREG (generalized regression) can be constructed from ??~ after estimating the sample inclusion propensities using the calibration approach under the postulated assumption of model holding for ??~ and its complement. By treating the problem as a complete missing data problem for ??*, a new estimator iGREG (i for imputation) can be constructed from ??*after imputing ?? for all units in ??* by using ??~ as the donor dataset under a model whose validity can be partially tested using ?? observed in both samples. Analogous to the quasi-design based approach in probability samples with nonresponse, we start with a design-based approach using the reference sample ??*, but build over it by integrating ??-information from the purposive sample ??~ under an imputation model. This approach is termed model-over-design (MOD) integration following Singh (2015). The information on the differences between imputed and observed values of ?? provide extra covariates with the constraints of zero control totals to reduce the imputation bias via weight calibration. Variance estimates for iGREG can be obtained by extending results under the reverse framework for nonresponse imputation in probability samples (where the respondent subsample serves as the donor dataset) to the case of complete missingness by design where an external dataset (??~) serves as the donor dataset. Limited simulation results are presented for illustration.
|