Abstract Details
Activity Number:
|
509
|
Type:
|
Contributed
|
Date/Time:
|
Wednesday, August 6, 2014 : 10:30 AM to 12:20 PM
|
Sponsor:
|
Section on Statistical Learning and Data Mining
|
Abstract #312398
|
View Presentation
|
Title:
|
Classification for Highly Unbalanced Data by Maximizing Area Under Precision-Recall Curve
|
Author(s):
|
Lixia Zhang*+ and Howard D. Bondell
|
Companies:
|
North Carolina State University and North Carolina State University
|
Keywords:
|
Classification ;
Variable Selection ;
Precision-Recall Curve
|
Abstract:
|
The Precision-Recall (PR) curve is an alternative to the Receiver Operating Characteristic (ROC) curve to evaluate classifier performance. The area under the curve is the most popularly used summaries of the curves. However, in the case of highly unbalanced classes, the majority of the area under the ROC curve corresponds to regions of very low specificity, and hence is not a good measure of performance. The partial area under the ROC curve can be used instead, but the user needs to specify a threshold on the specificity to consider. Procedures that maximize the area under the ROC curve to select linear combinations of biomarkers can perform poorly in these unbalanced cases. The use of the PR curve avoids the low specificity regions without the necessary choice of a threshold. Maximization of the area under the PR curve leads to classifiers with lower false discovery rate (FDR) and thus overcomes the shortcomings of ROC. In this paper, we propose a nonparametric approach to estimation of the optimal linear combination that maximizes the area under the PR curve. A permutation-based Forward Selection method is included to allow for selection of the relevant biomarkers.
|
Authors who are presenting talks have a * after their name.
Back to the full JSM 2014 program
|
2014 JSM Online Program Home
For information, contact jsm@amstat.org or phone (888) 231-3473.
If you have questions about the Professional Development program, please contact the Education Department.
The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.
Copyright © American Statistical Association.