Abstract:
|
Label noise in data has been a long-lasting problem in supervised learning applications. It affects the effectiveness of many widely used classification methods. Important real-world applications, such as medical diagnosis and cybersecurity, call for solutions to address label noise problems under the Neyman-Pearson paradigm, which constrains the more severe type of error (e.g., type I error) under a preferred level while minimizing the other (e.g., the type II error). It comes as a surprise that even when training data have errors in label, usual Neyman-Pearson classifiers, which ignore the label noise in the training stage, are able to control type I error with high probability. However, the price to pay is overly conservativeness of the type I error and a significant drop in power (i.e., 1? type II error). Assuming a knowledge of corruption severity, we propose the first theory-backed algorithm that adapts most state-of-the-art classification methods to the training label noise and constructs classifiers that not only control the type I error with high probability under the desired level but also achieve good power.
|