|
|
|
This is the preliminary program for the 2007 Joint Statistical
Meetings in Salt Lake City, Utah.
|
|
|
The views expressed here are those of the individual authors and not necessarily those of the ASA or its board, officers, or staff. Back to main JSM 2007 Program page |
= Applied Session,
= Theme Session,
= Presenter| CE_22C | Tue, 7/31/07, 8:00 AM - 12:00 PM | CC-151 D-F |
| Harnessing Data Streams through Statistical Computing - Continuing Education - Course | ||
|
Section on Statistical Computing, ASA |
||
| Instructor(s): Tamraparni Dasu, AT&T Labs - Research, Simon Urbanek, AT&T Labs - Research | ||
| Our tutorial is strongly motivated by JSM 2007's theme "Statistics: Harnessing the Power of Information". Data streams are a predominant form of information today, arising in areas and applications ranging from telecommunications, meteorology and rocketry, to the monitoring and support of e-commerce sites. Data streams are characterized by large volumes and high rates of accumulation. They pose unique analytical, statistical and computing challenges that are just beginning to be addressed. It is an important area that statisticians can make significant contributions to, an area rife with open research problems. In this tutorial, we give an introduction and overview of the analysis and monitoring of data streams. We discuss the analytical and computing challenges posed by the unique constraints associated with data streams. There are a wide variety of problems; data reduction, characterizing constantly changing distributions, detecting changes in these distributions, computing and updating models for evolving data streams, identifying outliers, tracking rare events, "correlating" multiple data streams and others. The current work in this area is dominated by the computer science community, with a largely algorithmic approach with the emphasis on data queries. However it lacks analytical rigor and a strong theoretical framework, based mostly on disparate methodologies aimed at solving specific problems. Statisticians can make significant contributions in this area. Statistical computing is an ideal framework for the analysis of data streams. It offers statistical rigor and confidence guarantees, not just performance guarantees as is normally the case with algorithmic methods. We give an overview of existing literature and applications, highlighting opportunities for statistical research where appropriate. We make extensive use of examples and real life applications to elucidate the material. In particular, we discuss major applications that we have worked as running examples throughout the tutorial. We conclude with a discussion of open research problems in this dynamic area. | ||
|
JSM 2007
For information, contact jsm@amstat.org
or phone (888) 231-3473. If you have questions about the Continuing Education program,
please contact the Education Department. |