Online Program

Return to main conference page
Friday, May 18
Machine Learning
Statistical Challenges in Large-Scale Data Mining
Fri, May 18, 1:30 PM - 3:00 PM
Regency Ballroom A
 

Approximate Data Analytics (304390)

*Christopher Jermaine, Rice University 

Today’s data sets are so large that it can be prohibitively expensive to perform standard analytics over the data—expensive both in terms of the time and the compute power required.

For many years, data analytics and database researchers have studied the use of statistical approximations as a solution to this problem. If one first builds a model of the data set that can be stored in small space, then the model can be queried repeatedly and inexpensively, rather than going back to the original data. Many different models for approximate data analytics have been proposed over the years, from methods based on survey sampling to randomized sketching. But while the models and methods or approximate data analytics are all statistical in nature, relatively few statisticians have worked on these problems—or at least there has been little collaboration with computer scientists.

In this talk, I will give an overview of the state of the art in approximate data analytics. I will point out some of the limitations in the state of the art, and focus on problems where ideas and participation from the statistics research community would be especially helpful.