JSM 2011 Online Program

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

Activity Details


CE_06C Sun, 7/31/2011, 8:00 AM - 12:00 PM HQ-Poinciana 3 & 4
Data Stream Mining: Tools and Applications — Continuing Education Course
ASA , Section on Statistical Computing
Instructor(s): Simon Urbanek, AT&T Labs Research, Tamraparni Dasu, AT&T Labs Research
Mining data streams is a challenging task complicated by the dynamic nature of the data, high rate of accumulation and limited, one-time access. The streams are riddled with complex, interdependent data glitches. At the same time, stream mining is important since many critical data mining applications involve streams such as sensor networks, internet traffic, and mobility applications. This tutorial provides a comprehensive approach to data stream mining with emphasis on tools, techniques and their application to solve real world stream mining problems. We start with an introduction to data streams, discuss analytical and computing challenges posed by the unique constraints associated with them. We present nonparametric methods that are eminently suitable for stream mining and are computationally lightweight. We demonstrate some of these through R code that will be a part of the course. We use running examples from social networking, sensor networks and financial ticker streams to illustrate a wide variety of stream mining tasks - nonparametric summaries of the stream; detecting outliers and distributional shifts; computing and updating models for evolving streams; visualizing streams and stream summaries; measuring data quality and data cleaning. We conclude with an overview of open research problems in the area of statistical stream mining.



2011 JSM Online Program Home

For information, contact jsm@amstat.org or phone (888) 231-3473.

If you have questions about the Continuing Education program, please contact the Education Department.