Abstract:
|
Estimation of total population size using incomplete lists has long been an important problem across many biological and social sciences. For example, partial and overlapping lists of casualties in the Syrian civil war are constructed by multiple organizations, and it is of great interest to use this information to estimate the magnitude of destruction of the war. Earlier approaches to solving these kinds of problems have either used strong parametric assumptions or suboptimal plugin-type nonparametric techniques; however, both approaches can lead to substantial bias, the former via model misspecification and the latter via smoothing. Under an identifying assumption that two lists are conditionally independent given covariate information, we make the following advances: First, we derive a nonparametric efficiency bound for estimating the capture probability, based on the efficient influence function. Then we construct a bias-corrected estimator that attains this bound under weak nonparametric conditions. Finally, finite-sample properties of the proposed estimator are studied with simulations, and we apply our methods to estimate HIV prevalence in the Alameda County in California.
|