Online Program Home
My Program

Abstract Details

Activity Number: 432
Type: Contributed
Date/Time: Tuesday, August 2, 2016 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #321061
Title: Feature Selection for Class-Imbalanced Data Using Binormal Precision-Recall Curves
Author(s): Zhongkai Liu* and Howard Bondell
Companies: North Carolina State University and North Carolina State University
Keywords: Variable selection ; Imbalanced data ; Precision-Recall curve ; Binormal framework

Feature or variable selection on imbalanced data, i.e. a large skew in the class distribution, is a challenging problem. In this paper, we propose a regularized binormal Precision-Recall algorithm for variable selection in the classification context. It consists of two stages. The first stage is to compute the area under the Precision-Recall curve (AUCPR) in a binormal framework. With the binormal AUCPR criterion, we apply the threshold gradient descent regularization (TGDR) method for variable selection, which is the second stage. The proposed variable selection approach works well, especially when facing class-imbalanced data sets. We demonstrate via both simulations and real data analysis, that our method outperforms that based on the area under the ROC curve.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2016 program

Copyright © American Statistical Association