Abstract #300137


The views expressed here are those of the individual authors
and not necessarily those of the ASA or its board, officers, or staff.


Back to main JSM 2002 Program page



JSM 2002 Abstract #300137
Activity Number: 138
Type: Topic Contributed
Date/Time: Monday, August 12, 2002 : 2:00 PM to 3:50 PM
Sponsor: Section on Government Statistics*
Abstract - #300137
Title: A Simple, Statistical Approach to Data Quality
Author(s): Ashish Sanil*+
Affiliation(s): National Institute of Statistical Sciences
Address: PO Box 14006, Research Triangle Park, North Carolina, 27709-4006, USA
Keywords: Data quality
Abstract:

We present an approach and framework to characterize data quality in databases assembled for research purposes. The framework addresses data quality for both individual components of the database and the database itself, seen as a system of integrated tables. Among characteristics of the framework are its use of simple techniques and visualization to identify anomalies that are at the record, table (with a relational database) and database levels. The framework also encompasses metadata and usability.

We develop the framework and illustrate the usefulness of simple approaches using examples from the Toxic Release Inventory (TRI) and the Intermodal Transportation Database (ITDB).

We also present a description of a Data Quality toolkit that could automate the generic needs of data quality checking.


  • The address information is for the authors that have a + after their name.
  • Authors who are presenting talks have a * after their name.

Back to the full JSM 2002 program

JSM 2002

For information, contact meetings@amstat.org or phone (703) 684-1221.

If you have questions about the Continuing Education program, please contact the Education Department.

Revised March 2002