Abstract:
|
Semi-supervised (SS) inference has received a lot of attention in recent times. In SS settings, apart from a moderate sized labeled data L, one has a much larger sized unlabeled data U available with |U| >> |L|, which makes them unique and different from standard missing data problems. However, most of the SS literature implicitly assumes L and U to be equally distributed. There is hardly any work under missing at random (MAR) type labeling with selection bias, which is far more realistic but also quite challenging due to the inevitably decaying nature of the propensity score (PS) here. To address this major gap, we consider SS estimation of the mean response under such MAR settings. We develop a SS double robust (DR) estimator as an adaptation of traditional DR estimators to this extreme setting. We give a complete characterization of its asymptotic properties through a series of results requiring only high-level conditions on the nuisance estimators. Lastly, another key challenge is to model the decaying PS for which we propose several novel choices and provide detailed results on their properties under both high and low dimensional settings. These may be of independent interest.
|