Conference Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 98 - Student Paper Award and John M. Chambers Statistical Software Award
Type: Topic Contributed
Date/Time: Monday, August 8, 2022 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistical Graphics
Abstract #322468
Title: Distribution Compression in Near-Linear Time
Author(s): Abhishek Shetty* and Raaz Dwivedi and Lester Mackey
Companies: University of California Berkeley and MIT and Microsoft Research New England
Keywords: Distribution Compression; Markov Chain Monte Carlo; Maximum Mean discrepancy; Reproducing Kernel Hilbert space; thinning

In distribution compression, one aims to accurately summarize a probability distribution $P$ using a small number of representative points. Near-optimal thinning procedures achieve this goal by sampling n points from a Markov chain and identifying $\sqrt{n}$ points with $O(1/\sqrt{n})$ discrepancy. Unfortunately, these algorithms suffer from quadratic or super-quadratic runtime. To address this deficiency, we introduce Compress++, a simple meta-procedure for speeding up any thinning algorithm while suffering at most a factor 4 in error. When combined with the quadratic-time kernel thinning algorithms of Dwivedi and Mackey (2021), Compress++ delivers $\sqrt{n}$ points with $O(\sqrt{ log n /n })$ integration error and better-than-Monte-Carlo maximum mean discrepancy in $O(n log^3 n)$ time. Moreover, Compress++ enjoys the same near-linear runtime given any quadratic-time input and reduces the runtime of super-quadratic algorithms by a square-root factor. In our benchmarks with high-dimensional Monte Carlo samples and Markov chains targeting challenging differential equation posteriors, Compress++ nearly matches the accuracy of its input algorithm in orders of magnitude less time.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2022 program