We consider a general question of estimating linear functionals of a distribution based on noisy samples. Extending prior results of Donoho-Liu and Juditsky-Nemirovski, we show that by taking the dual optimization problem, the (two-point) LeCam lower bound is in fact achievable by optimizing the bias-variance tradeoff of an empirical-mean type of estimator.
We extend the method to separable functionals of high-dimensional parametric models and apply it to obtain sharp rates of a number of problems in the area of "estimating the unseens". In particular, coupled with tools from complex analysis, this method resolves the optimal rate for Fisher's species problem: given $n$ independent samples drawn from an unknown distribution, the optimal normalized prediction error of the number of unseen symbols in the next (unobserved) $r n$ samples is shown to be within logarithmic factors of $n^{-min\{1/(r+1),1/2\}}$, exhibiting a phase transition from parametric to nonparametric rate at $r=1$. Furthermore, the optimal estimator can be found by solving a linear program of size that is polynomial in $n$.
This is joint work with Yury Polyanskiy (MIT).
|