Abstract:
|
Feature or variable selection on imbalanced data, i.e. a large skew in the class distribution, is a challenging problem. In this paper, we propose a regularized binormal Precision-Recall algorithm for variable selection in the classification context. It consists of two stages. The first stage is to compute the area under the Precision-Recall curve (AUCPR) in a binormal framework. With the binormal AUCPR criterion, we apply the threshold gradient descent regularization (TGDR) method for variable selection, which is the second stage. The proposed variable selection approach works well, especially when facing class-imbalanced data sets. We demonstrate via both simulations and real data analysis, that our method outperforms that based on the area under the ROC curve.
|