Abstract:
|
We consider the estimation of two-sample density functionals
T(f,g) = \int f(x)\phi (f(x),g(x),x) dx,
based on independent d-dimensional random vectors X_1,...,X_m with density f and Y_1,...,Y_n with density g. The interest in such functionals arises from many applications: for instance, many divergences such as the KL divergence, total variation and Hellinger distances are of this form.
The estimators we consider can be expressed as weighted sums of preliminary estimators based on nearest neighbour distances. We provide conditions under which these estimators are efficient, in the sense of achieving the local asymptotic minimax lower bound, and under which they are asymptotically normal. As well as the significant theoretical contributions in this work, we also show how our results enable the construction of asymptotically valid confidence intervals. Our results also reveal an interesting phenomenon in which the natural `oracle' estimator, requiring knowledge of f and g, can be outpeformed by our estimators. For some functionals of interest we show that the asymptotic limit of the ratio of the L2 risks is strictly less than one, uniformly over suitable classes of densities.
|