Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 288 - SLDS CSpeed 5
Type: Contributed
Date/Time: Wednesday, August 11, 2021 : 1:30 PM to 3:20 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #318545
Title: Classification for Imbalanced Data
Author(s): Renxiong Liu* and Yunzhang Zhu
Companies: Ohio State University and Ohio State University
Keywords: imbalanced data; classification; model misspecification; recall and precision
Abstract:

Class imbalance commonly occurs in many real classification problems. Traditional evaluation metrics such as misclassification error rate are not very informative when the data is highly imbalanced. As an alternative, recall and precision are often used to assess the performance. In this work, we formally introduce the notion of optimal classifiers in terms of precision and recall over an arbitrary model space. It is argued that any classification method implemented over the given model space should target the optimal classifiers, leading to a minimal requirement which we call $\mathcal H$-consistency. Next, we propose a novel two-step procedure to estimate optimal classifiers and show its $\mathcal H$-consistency over a general model space, which allows for possible model mis-specification. Efficient computational methods are also developed for the proposed two-step procedure. Moreover, we show that cost-sensitive learning, a popular method for imbalanced data classification, is not $\mathcal H$-consistent, and may miss some optimal classifiers when model is incorrectly specified. We demonstrate the efficacy of our proposed method through both simulations and a real data analysis.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program