Friday, May 18

Data Science Foundations

Fri, May 18, 1:30 PM - 3:00 PM
Lake Fairfax B

Defining the AIM: An Abstraction for Improving Machine Learning Prediction (304598)

*VICTORIA STODDEN, University of Illinois Urbana-Champaign
XIAOMIAN WU, University of Illinois Urbana-Champaign

Keywords: reproducibility, abstraction, workflow, machine learning competition, machine learning prediction challenge

We introduce an Abstraction for Improving Machine learning (AIM) that leverages workflow and system information to improve prediction outcomes and enable meaningful comparisons of ML pipelines. We implement AIM for a well-known acute leukemia classification problem and apply it in the ML Prediction Challenge setting. The abstraction is customizable for each challenge to support creativity and varied approaches, but is fixed within a challenge to enable the comparability and modularized re-use of components of the ML prediction pipeline. AIM provides three direct benefits: 1) the sources of outcome differences between ML pipelines can be more efficiently traced to differences at the implementation level, 2) improvements can be made to specific aspects of the pipeline, and 3) the reuse of components across pipelines is facilitated. AIM provides a structured way to evaluate algorithm performance at the level of well-defined pipeline components. It also permits the visualization of steps in the pipeline, from preprocessing to prediction. The AIM cancer classification application we present demonstrates the crucial need for an abstraction layer for ML pipelines to enable reproducibility and guide the rapid resolution ML prediction differences.

Online Program

Defining the AIM: An Abstraction for Improving Machine Learning Prediction (304598)

ASA Meetings Department