|
Activity Number:
|
138
|
|
Type:
|
Topic Contributed
|
|
Date/Time:
|
Monday, August 12, 2002 : 2:00 PM to 3:50 PM
|
|
Sponsor:
|
Section on Government Statistics*
|
| Abstract - #300242 |
|
Title:
|
Having It All
|
|
Author(s):
|
Allan Wilks*+
|
|
Affiliation(s):
|
AT&T Labs - Research
|
|
Address:
|
Room C207, 180 Park Avenue, Florham Park, New Jersey, 07932-0971, USA
|
|
Keywords:
|
|
|
Abstract:
|
A major problem in building a large transaction database is determining the completeness of the data. I will describe my experience in this area with reference to an 8 TB database at AT&T that has about 200 billion records in it, arriving at the rate of about 350 million records per day. The data is collected from about 500 sources that have a wide range of reporting frequency, data volume, and reliability. Do we have a complete list of possible sources? Are all sources reporting? Why does a source become quiet for a period of time? Are we getting everything a source is sending? Does every record we receive make it into the database? What can we say about the completeness of the database with respect to old data? Has any corruption crept in? I will deal with these and other questions of completeness.
|
- The address information is for the authors that have a + after their name.
- Authors who are presenting talks have a * after their name.
Back to the full JSM 2002 program |