Online Program Home
My Program

Abstract Details

Activity Number: 533 - SLDS CPapers NEW 2
Type: Contributed
Date/Time: Wednesday, August 1, 2018 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #330842
Title: Predictive Modeling Applied in National Reporter Cleaning
Author(s): Xuemei Pan* and Mary Pritts and COBY LU
Companies: and IBM and IBM
Keywords: Anomaly detection; entropy; Data visualization; correlation; Multivariate Gaussian distributions; optimization

Anomaly detection has become a hot topic in many areas recently. In our IBM mail transit time measurement study, we developed data cleaning rules using entropy to optimize the same day delivery cut-off time for a panel of IBM maintained, over 10,000 national reporters. Data visualization and correlation analysis were used to define final cut-off times for reporting across different distance buckets. Variables were assessed using the Pearson correlation coefficient to determine which had an influence over cleaning rules. Graphs were produced using various percentiles so the data could be visualized in a simplified manner. Multivariate Gaussian distributions as an unsupervised method were also used for anomaly detection in the data cleaning analysis. The recommendations based on these techniques have been adapted and implemented in the IBM national large scale measurement study (with more than 10 million reporter scans annually) and worked out very well.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program