Online Program Home
My Program

Abstract Details

Activity Number: 236 - SLDS Student Paper Awards
Type: Topic Contributed
Date/Time: Monday, July 30, 2018 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #328491 Presentation
Title: PULasso: High-Dimensional Variable Selection with Presence-Only Data
Author(s): Hyebin Song*
Companies: UW-Madison
Keywords: PU-learning; majorization-minimization; non-convexity; regularization

In various real-world problems, we are presented with positive and unlabelled data, referred to as presence-only responses and where the number of covariates p is large. The combination of presence-only responses and high dimensionality presents both statistical and computational challenges. In this paper, we develop the PUlasso algorithm for variable selection and classification with positive and unlabelled responses. Our algorithm involves using the majorization-minimization (MM) framework which is a generalization of the well-known expectation-maximization (EM) algorithm. In particular to make our algorithm scalable, we provide two computational speed-ups to the standard EM algorithm. We provide a theoretical guarantee where we first show that our algorithm is guaranteed to converge to a stationary point, and then prove that any stationary point achieves the minimax optimal mean-squared error of slogp/n, where s is the sparsity of the true parameter. We also demonstrate through simulations that our algorithm out-performs state-of-the-art algorithms in the moderate p settings in terms of classification performance and performs well on a real biochemistry example.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program