Abstract:
|
Cost-sensitive (CS) classification methods account for asymmetric misclassification costs and are widely applied in real-world problems such as medical diagnosis, transaction monitoring, and fraud detection. The current approaches to binary CS learning usually assign different weights to the two classes in non-unified ways, and the three main ways include rebalancing the sample before training, changing the objective function for training, and adjusting the estimated posterior class probabilities after training. Moreover, existing CS learning work has only focused on improving empirical classification errors or costs incurred by the assigned weights while overlooking the changes in population classification errors. We propose an umbrella algorithm to estimate the population type I error control achieved by multiple binary CS learning approaches. Our algorithm for the first time establishes a connection between CS learning and the Neyman-Pearson classification paradigm, which minimizes the population type II error while enforcing an upper bound on the population type I error.
|