Abstract:
|
We consider the regression setting where the response variable is subject to nonignorable missing data, i.e., the propensity score model depends on the missing values themselves. In such problems, model misspecification and model identifiability are two critical issues. A fully parametric approach can produce results that are sensitive to the model assumptions, while a fully nonparametric approach may not be sufficient for model identification. We propose a new flexible semiparametric propensity score model where the relationship between the missingness indicator and the partially observed response is totally unspecified and estimated nonparametrically, while the relation between the missingness indicator and the fully observed covariates are modeled parametrically. We consider the exponential family for the complete data and show that the model is identifiable. A semiparametric treatment is employed to construct efficient estimators for the parameters of interest. Its finite-sample performance is examined through simulation studies. We further illustrate the proposed method via an empirical analysis of an electronic health record application.
|