Online Program Home
My Program

Abstract Details

Activity Number: 286 - Advances in Bayesian Nonparametric Methods and Its Applications
Type: Topic Contributed
Date/Time: Tuesday, July 30, 2019 : 8:30 AM to 10:20 AM
Sponsor: Section on Bayesian Statistical Science
Abstract #306793 Presentation
Title: A Bayesian Nonparametric View on Count-Min Sketch
Author(s): Diana Cai* and Michael Mitzenmacher and Ryan Adams
Companies: Princeton University and Harvard University and Princeton University
Keywords: bayesian nonparametrics; sketching; dirichlet process

The count-min sketch is a time- and memory-efficient randomized data structure that provides a point estimate of the number of times an item has appeared in a data stream. The count-min sketch and related hash-based data structures are ubiquitous in systems that must track frequencies of data such as URLs, IP addresses, and language n-grams. We present a Bayesian view on the count-min sketch, using the same data structure, but providing a posterior distribution over the frequencies that characterizes the uncertainty arising from the hash-based approximation. In particular, we take a nonparametric approach and consider tokens generated from a Dirichlet process (DP) random measure, which allows for an unbounded number of unique tokens. Using properties of the DP, we show that we can straightforwardly compute posterior marginals of the unknown true counts and that the modes of these marginals recover the classical count-min sketch estimator, inheriting the associated probabilistic guarantees. Additionally, the posterior distribution leads to natural shrinkage estimators for the count, and we demonstrate this on text data. Joint work with Michael Mitzenmacher and Ryan P. Adams.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program