Abstract:
|
We introduce kernel thinning, a simple algorithm for generating better-than-Monte-Carlo approximations to target distributions P on R^d for estimating function integrals in a reproducing kernel Hilbert space. It is an online algorithm and given an input sequence of size n it outputs a coreset of size n^{1/2} in O(n^2) computation time. When the input sequence admits an integration error of O(n^{-1/2}), the integration error for the kernel thinning coreset is bounded as O_d(n^{-1/2}\sqrt{log n loglog n}) for compactly supported P and O_d(n^{-1/2}) \sqrt{(log n)^{d+1} loglog n}) for sub-exponential P. In comparison, n^{1/2} i.i.d. Monte Carlo sample points from P yield Omega(n^{-1/4}) integration error. Our guarantees for sub-exponential distributions match the classical rates of Quasi-Monte Carlo point sets on the unit cube but apply to general target distributions on R^d and a variety of infinite-dimensional kernels including Gaussian, Matern, B-spline, Wendland, inverse multiquadric, and hyperbolic secant kernels. We also introduce efficient strategies to generate near-optimal L^\infty coresets of size n^{1/2} with L^\infty error O(\sqrt{d/n * log n loglog n}) for such kernels.
|