Online Program Home
My Program

Abstract Details

Activity Number: 116 - Modern Advances in Record Linkage Using Statistical Learning Methods
Type: Topic Contributed
Date/Time: Monday, July 30, 2018 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistical Learning and Data Science
Abstract #329823
Title: Breaking Computational Chicken-And-Egg Loop in Adaptive Sampling and Estimations Using Locality Sensitive Sampling (LSS)
Author(s): Anshumali Shrivastava*
Companies: Rice University
Keywords: Estimation; Adaptive Sampling ; Hashing; Locality Sensitve; Machine Learning; Big-Data

: In this talk, I will discuss some of my recent and surprising findings on the use of hashing algorithms for large-scale statistical estimations. Locality Sensitive Hashing (LSH) is a hugely popular algorithm for sub-linear near neighbor search. However, if we look at the proofs of LSH, then it turns out that fundamentally LSH is a constant time (amortized) adaptive sampler which is being used for the near-neighbor search. This observation adds another feather in the cap for LSH. LSH offers a unique capability to do smart sampling and statistical estimations. Our observation bridges data structures (probabilistic hash tables) with efficient unbiased statistical estimations. I will demonstrate how this dynamic and efficient sampling beak the computational barriers in adaptive estimations where, for the first time, it is possible that we pay the cost of uniform sampling but get the benefits of adaptive sampling. I will show how one fundamental idea can break what we call computational chicken-and-egg loop on three favorite problems of 1) partition function estimation, 2) adaptive gradient estimation and 3) Unique entity estimation in record linkage.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program