Abstract:
|
It is known that for the case of nonparametric regression with missing at random (MAR) responses, under a mild assumption on smoothness of the regression and nuisance functions, a complete case approach implies an optimal rate and constant of the MISE convergence. The situation changes dramatically if responses are missing not at random (MNAR), that is when the probability of missing (the availability likelihood) depends on value of the response.The MNAR becomes impossible even for consistent estimation of the regression. Then the only possibility to unlock the information contained in MNAR data is to estimate the availability likelihood using an extra sample. If such a sampling is possible, then what is the sample size of the extra sample that allows us to match performance of an oracle that knows the availability likelihood? In other words, how expensive the extra sampling should be? This is the question that is explored, and it is shown that using an extra sampling for MNAR data is feasible.
|