Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 205 - Applications of Machine Learning
Type: Contributed
Date/Time: Tuesday, August 4, 2020 : 10:00 AM to 2:00 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #313287
Title: Improving Productionized Insights in Machine Learning Models Through Data-Quality Quantification
Author(s): Christopher Barbour* and Paul Harmon and Eric Loftsgaarden
Companies: Atrium and Atrium and Atrium
Keywords: Predictive Modeling; Data Quality; Machine Learning Tools; Measurement Error; Enterprise Data

A large portion of currently collected enterprise data contains gaps in the quality of the information being captured. This lack of quality in the collection and governance of enterprise data limits the impact of statistical and machine learning models for improving business processes. Constructing a methodological toolkit that quantifies the amount of ‘data-contamination’ can provide guidance on specific improvements and recommendations on data collection, storage, and governance. Additionally, it can improve the relevance of the uncovered insights and predictions of predictive models built with such data. This research will begin with previous methodology attempting to address this issue as well as example data to demonstrate different aspects of data contamination. A simulation study and examples from real-world data will then demonstrate the proposed methodology and illustrate improvements that can be made in predictive modeling, primarily, estimating less biased relationships between predictors and model outcomes. Challenges and future directions will also be discussed.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program