Keywords: EM algorithm, ES algorithm, incomplete data, Lasso, missing data, model selection, non-parametric methods, semi-parametric methods, wavelet thresholding.
The notion of self-consistency (Efron, 1967) analytically formulates an ad hoc "chicken-or-egg" idea of dealing with censored data. Knowing the uncensored values, we could easily estimate the survival distribution by one minus the empirical CDF (cumulative distribution function). Knowing the CDF, we could improve our imputation of the censored values via conditional expectations. Iterating these two steps until no more improvement---under the squared loss---would possibly lead to a fixed-point equation, whose solution is the celebrated Kaplan-Meier estimator. This article reveals that self-consistency is a general dual principle for semi-parametric and nonparametric statistical analysis with incomplete data or more generally data with irregular patterns. Statistically, it extends MLE because virtually all parametric MLEs are automatically self-consistent under the squared loss, and it can easily accommodate other types of loss functions. Algorithmically, it generalizes the EM algorithm in the sense that it replaces the complete-data loglikelihood function in EM by an arbitrary complete/regular-data estimator of an object of interest (e.g., curve, image). The Achille's heel in this seemingly almighty methodology is that the required computation for exact implementation can be prohibitively expensive because in general it requires solving typically a multi-dimensional non-linear equation at each iteration. However, using wavelet de-noising and Lasso as applications, we show that it is entirely possible to achieve practically appealing trade-off by approximated implementations that retain much of both statistical and computational efficiency. Theoretically, we demonstrate that the self-consistency formulation permits us to transfer the problem of establishing convergence properties of an incomplete-data estimator into the corresponding ones for a complete-data estimator via a contraction map, and thereby it enhances our theoretical toolbox as well.