Abstract #300242


The views expressed here are those of the individual authors
and not necessarily those of the ASA or its board, officers, or staff.


Back to main JSM 2002 Program page



JSM 2002 Abstract #300242
Activity Number: 138
Type: Topic Contributed
Date/Time: Monday, August 12, 2002 : 2:00 PM to 3:50 PM
Sponsor: Section on Government Statistics*
Abstract - #300242
Title: Having It All
Author(s): Allan Wilks*+
Affiliation(s): AT&T Labs - Research
Address: Room C207, 180 Park Avenue, Florham Park, New Jersey, 07932-0971, USA
Keywords:
Abstract:

A major problem in building a large transaction database is determining the completeness of the data. I will describe my experience in this area with reference to an 8 TB database at AT&T that has about 200 billion records in it, arriving at the rate of about 350 million records per day. The data is collected from about 500 sources that have a wide range of reporting frequency, data volume, and reliability. Do we have a complete list of possible sources? Are all sources reporting? Why does a source become quiet for a period of time? Are we getting everything a source is sending? Does every record we receive make it into the database? What can we say about the completeness of the database with respect to old data? Has any corruption crept in? I will deal with these and other questions of completeness.


  • The address information is for the authors that have a + after their name.
  • Authors who are presenting talks have a * after their name.

Back to the full JSM 2002 program

JSM 2002

For information, contact meetings@amstat.org or phone (703) 684-1221.

If you have questions about the Continuing Education program, please contact the Education Department.

Revised March 2002