JSM 2014 Home
Online Program Home
My Program

Abstract Details

Activity Number: 509
Type: Contributed
Date/Time: Wednesday, August 6, 2014 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Learning and Data Mining
Abstract #312398 View Presentation
Title: Classification for Highly Unbalanced Data by Maximizing Area Under Precision-Recall Curve
Author(s): Lixia Zhang*+ and Howard D. Bondell
Companies: North Carolina State University and North Carolina State University
Keywords: Classification ; Variable Selection ; Precision-Recall Curve
Abstract:

The Precision-Recall (PR) curve is an alternative to the Receiver Operating Characteristic (ROC) curve to evaluate classifier performance. The area under the curve is the most popularly used summaries of the curves. However, in the case of highly unbalanced classes, the majority of the area under the ROC curve corresponds to regions of very low specificity, and hence is not a good measure of performance. The partial area under the ROC curve can be used instead, but the user needs to specify a threshold on the specificity to consider. Procedures that maximize the area under the ROC curve to select linear combinations of biomarkers can perform poorly in these unbalanced cases. The use of the PR curve avoids the low specificity regions without the necessary choice of a threshold. Maximization of the area under the PR curve leads to classifiers with lower false discovery rate (FDR) and thus overcomes the shortcomings of ROC. In this paper, we propose a nonparametric approach to estimation of the optimal linear combination that maximizes the area under the PR curve. A permutation-based Forward Selection method is included to allow for selection of the relevant biomarkers.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2014 program




2014 JSM Online Program Home

For information, contact jsm@amstat.org or phone (888) 231-3473.

If you have questions about the Professional Development program, please contact the Education Department.

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

ASA Meetings Department  •  732 North Washington Street, Alexandria, VA 22314  •  (703) 684-1221  •  meetings@amstat.org
Copyright © American Statistical Association.