JSM 2015 Online Program

Online Program Home
My Program

Abstract Details

Activity Number: 239
Type: Topic Contributed
Date/Time: Monday, August 10, 2015 : 2:00 PM to 3:50 PM
Sponsor: Survey Research Methods Section
Abstract #317002 View Presentation
Title: Sketches for Stratified Sampling
Author(s): Jack Gorham*
Companies: Stanford University
Keywords: sketches ; stratified ; hypergeometric

Analysts at large web firms are often given the task of analyzing and processing tremendous amounts of data in a quick, iterative fashion. Oftentimes, this involves formulating a sequence of hypotheses to test, each of which queries the same data multiple times but analyzes a different stratum. This presents challenges when each query must be answered in a short amount of time and when computational resources are constrained. Using subsampled data is one way to reduce both the time and computational cost while still being able to provide statistical insights about the data. Furthermore, when the subsampled data is small enough to store in memory, subsampling can greatly increase the set of software packages that may be used for analysis. However, drawing a useful subsample can be problematic when the data is severely skewed, with a few strata dominating the others in size. This work proposes a novel streaming method for drawing a stratified sample from a stream where the memory budget is constrained, the data may be very skewed, and the number of strata of interest is potentially very large.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2015 program

For program information, contact the JSM Registration Department or phone (888) 231-3473.

For Professional Development information, contact the Education Department.

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

2015 JSM Online Program Home