JSM 2002

Activity Number:	138
Type:	Topic Contributed
Date/Time:	Monday, August 12, 2002 : 2:00 PM to 3:50 PM
Sponsor:	Section on Government Statistics*
Abstract - #300242
Title:	Having It All
Author(s):	Allan Wilks*+
Affiliation(s):	AT&T Labs - Research
Address:	Room C207, 180 Park Avenue, Florham Park, New Jersey, 07932-0971, USA
Keywords:
Abstract:	A major problem in building a large transaction database is determining the completeness of the data. I will describe my experience in this area with reference to an 8 TB database at AT&T that has about 200 billion records in it, arriving at the rate of about 350 million records per day. The data is collected from about 500 sources that have a wide range of reporting frequency, data volume, and reliability. Do we have a complete list of possible sources? Are all sources reporting? Why does a source become quiet for a period of time? Are we getting everything a source is sending? Does every record we receive make it into the database? What can we say about the completeness of the database with respect to old data? Has any corruption crept in? I will deal with these and other questions of completeness.

	Abstract #300242
The views expressed here are those of the individual authors and not necessarily those of the ASA or its board, officers, or staff. Back to main JSM 2002 Program page

Abstract #300242