Online Program Home
  My Program

Abstract Details

Activity Number: 425 - SPEED: Reliable Statistical Learning and Data Science
Type: Contributed
Date/Time: Tuesday, August 1, 2017 : 3:05 PM to 3:50 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #325261
Title: Learning from Imbalanced Data: a Review of Some Existing Methodologies
Author(s): Josephine Akosa* and Melinda McCann
Companies: Oklahoma State University and Oklahoma State University
Keywords: Imbalanced learning ; Predictive accuracy ; evaluation metrics ; cost sensitive measures ; classification ; sampling
Abstract:

Classification of imbalanced datasets is one of the biggest challenges encountered in data mining. Class imbalance severely compromises the process of model learning since classifiers tend to be biased towards the prevalent class. Additionally, the evaluation of a model's accuracy is jeopardized due to the dearth of data. Simulation studies are used to analyze three re-sampling algorithms (over-sample, under-sample, SMOTE) and several different evaluation metrics for assessing the effectiveness of a classifier in imbalanced data. The results suggest that model evaluation metrics may reveal more about the distribution of classes than they do about the actual performance of models when the data are imbalanced. Additionally, some of the classification models were identified to be very sensitive to imbalance and perform poorly in such cases. The final decision in model selection should consider a combination of different metrics instead of relying on only one. To avoid or minimize imbalance-biased performance estimates, we recommend reporting both the obtained metric values and the degree of imbalance in the data.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2017 program

 
 
Copyright © American Statistical Association