Abstract #301171


The views expressed here are those of the individual authors
and not necessarily those of the ASA or its board, officers, or staff.


Back to main JSM 2002 Program page



JSM 2002 Abstract #301171
Activity Number: 142
Type: Topic Contributed
Date/Time: Monday, August 12, 2002 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistical Graphics*
Abstract - #301171
Title: Selecting Among Categories
Author(s): Martin Theus*+
Affiliation(s): University of Augsburg
Address: Universitaetsstr. 14, Augsburg, International, 86135, Germany
Keywords: Very Large Databases ; 2 Level Data Access ; Categorical Data ; Hot Set Selection ; Selection Sequences ; Mosaic Plots
Abstract:

Most statistical graphics and statistical methods do not scale well to more than thousands or tens of thousands observations. But large databases exceed these limits easily. One exception are graphs for visualizing categorical data--i.e., counts represented by barcharts or mosaic plots. Fortunately, the data in corporate databases are mostly categorical. This allows for a visualization of even millions of records. Obviously, classical analysis software is not able to handle files of that size, and an analyst is tempted to dump only a subgroup of the data to be able to use his/her analysis tool of choice. But the a priori choice of a subset can be very cumbersome.

This paper highlights how to work on large databases, by facilitating displays and selection tools and techniques for categorical data. Using a two-level data access ("do not extract data from the database until the subset is small enough to handle"), combined with hot set selections (as implemented in DataDesk), the analyst can work seamlessly on even very large databases within one tool.

A first implementation of this technique is presented with the research software MONDRIAN.


  • The address information is for the authors that have a + after their name.
  • Authors who are presenting talks have a * after their name.

Back to the full JSM 2002 program

JSM 2002

For information, contact meetings@amstat.org or phone (703) 684-1221.

If you have questions about the Continuing Education program, please contact the Education Department.

Revised March 2002