Abstract #301199


The views expressed here are those of the individual authors
and not necessarily those of the ASA or its board, officers, or staff.


Back to main JSM 2002 Program page



JSM 2002 Abstract #301199
Activity Number: 304
Type: Topic Contributed
Date/Time: Wednesday, August 14, 2002 : 10:30 AM to 12:20 PM
Sponsor: Business & Economics Statistics Section*
Abstract - #301199
Title: EDA in the Age of Very Large Databases
Author(s): Patrick Tendick*+
Affiliation(s): Avaya Labs
Address: 233 Mt. Airy Rd., Basking Ridge, New Jersey, 07920, USA
Keywords: EDA ; business intelligence ; database ; data warehouse ; OLAP
Abstract:

In the 60s and 70s, Tukey developed the notion of exploratory data analysis (EDA), which has guided the evolution of statistical data analysis software since then.

However, the tenets of EDA were developed before the advent of very large databases (VLDBs). VLDBs--sometimes called data warehouses or data marts--are used in business intelligence (BI) and other online analytical processing (OLAP) applications. Beyond their increased capacity, VLDBs have brought many improvements in the ways in which we can access data, including query languages, client/server architectures, and metadata. Unfortunately, these improvements generally have not been reflected in statistical software.

This talk revisits EDA from a VLDB perpective. EDA techniques in current use often don't scale well, rely too heavily on detailed data, don't use metadata, and are too rigid. The author presents principles for reapplying Tukey's original approach to take advantage of VLDBs.


  • The address information is for the authors that have a + after their name.
  • Authors who are presenting talks have a * after their name.

Back to the full JSM 2002 program

JSM 2002

For information, contact meetings@amstat.org or phone (703) 684-1221.

If you have questions about the Continuing Education program, please contact the Education Department.

Revised March 2002