Abstract:
|
In the 60s and 70s, Tukey developed the notion of exploratory data analysis (EDA), which has guided the evolution of statistical data analysis software since then.
However, the tenets of EDA were developed before the advent of very large databases (VLDBs). VLDBs--sometimes called data warehouses or data marts--are used in business intelligence (BI) and other online analytical processing (OLAP) applications. Beyond their increased capacity, VLDBs have brought many improvements in the ways in which we can access data, including query languages, client/server architectures, and metadata. Unfortunately, these improvements generally have not been reflected in statistical software.
This talk revisits EDA from a VLDB perpective. EDA techniques in current use often don't scale well, rely too heavily on detailed data, don't use metadata, and are too rigid. The author presents principles for reapplying Tukey's original approach to take advantage of VLDBs.
|