Online Program Home
My Program

Abstract Details

Activity Number: 310 - Topics of Variable Selection
Type: Contributed
Date/Time: Tuesday, July 31, 2018 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistical Learning and Data Science
Abstract #330260 Presentation
Title: Budget-Constrained Feature Selection for Binary Classification: a Neyman-Pearson Approach
Author(s): Yiling Chen* and Xin Tong and Jingyi Li
Companies: University of California, Los Angeles and University of Southern California and University of California, Los Angeles
Keywords: feature selection; disease diagnosis; type 1 error; false positive control; Neyman-Pearson; machine learning

In biomedical applications such as cancer diagnosis, binary classification often requires asymmetric misclassification error control, because misclassifying a diseased patient as healthy vs. misclassifying a healthy patient as diseased would result in severely different consequences. Previously, we proposed the Neyman-Pearson (NP) classification paradigm to address such asymmetric classification problems. An important unsolved question is what features are more important under the NP paradigm. Here we propose NP-Rank, a method that ranks features based on their type II errors (the less severe type of misclassification error) with their type I errors (the more severe type of error) controlled under a user-specified threshold (such as 0.05) with high probability. NP-Rank has desirable theoretical guarantees when used with density plug-in classifiers. Extensive numerical studies show that NP-Rank, used with popular classification methods such as Logistic regression, outperforms traditional ranking methods under the classical paradigm. A real data application on DNA methylation profiles from breast cancer patients further demonstrates the advantages of NP-Rank.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program