JSM 2014 Home
Online Program Home
My Program

Abstract Details

Activity Number: 409
Type: Contributed
Date/Time: Tuesday, August 5, 2014 : 2:00 PM to 2:45 PM
Sponsor: Section on Nonparametric Statistics
Abstract #314043
Title: A Masking Index for Quantifying Hidden Glitches
Author(s): Ji Meng Loh*+ and Tamraparni Dasu and Laure Berti-Equille
Companies: New Jersey Institute of Technology and AT&T Labs Research and Qatar Computing Research Institute
Keywords: anomaly detection ; data cleaning ; data quality ; masking ; missing values
Abstract:

Data glitches are errors in a data set and are complex entities that often span multiple attributes and records. When they co-occur in data, the presence of one type of glitch can hinder the detection of another type of glitch. This phenomenon is called masking.

We describe two types of masking, and introduce an indicator called a masking index for quantifying the hidden glitches, focusing on four specific cases of masking: outliers masked by missing values, outliers masked by duplicates, duplicates masked by missing values, and duplicates masked by outliers.

The masking index is critical for data quality profiling and data exploration; it enables a user to measure the extent of masking and hence the confidence in the data. In this sense, it is a valuable data quality index for measuring the true cleanliness of the data. It is also an objective and quantitative basis for choosing an anomaly detection method that is best suited for the glitches that are present in any given data set. Time permitting we will also describe some results of experiments on synthetic and real-world datasets.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2014 program




2014 JSM Online Program Home

For information, contact jsm@amstat.org or phone (888) 231-3473.

If you have questions about the Professional Development program, please contact the Education Department.

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

ASA Meetings Department  •  732 North Washington Street, Alexandria, VA 22314  •  (703) 684-1221  •  meetings@amstat.org
Copyright © American Statistical Association.