Abstract:
|
Conducting the binary classification on imbalanced data, i.e. a large skew in the class distribution, is a challenging problem. Those classifiers that are based on the receiver operating characteristic (ROC) curve have been regarded as the golden standard in binary classification. However, in front of imbalanced data, the ROC curve tends to give an overly optimistic view. Realizing its disadvantages of dealing with imbalanced data, we propose a Precision-Recall (PR) curve based approach with a binormal assumption, where the key idea is to estimate the classifier that maximizes the area under the binormal Precision-Recall curve. The asymptotic distribution of the estimate is shown, and simulation as well as real data results indicate that the binormal Precision-Recall method outperforms approaches based on the area under the ROC curve in terms of false discovery rate and asymptotic variance.
|
ASA Meetings Department
732 North Washington Street, Alexandria, VA 22314
(703) 684-1221 • meetings@amstat.org
Copyright © American Statistical Association.