Online Program Home
My Program

Abstract Details

Activity Number: 311
Type: Contributed
Date/Time: Tuesday, August 2, 2016 : 8:30 AM to 10:20 AM
Sponsor: Biometrics Section
Abstract #320857
Title: Surrogate Aided Unsupervised Recovery of Sparse Signals in Single Index Models for Binary Outcomes Using Extreme Sampling
Author(s): Abhishek Chakrabortty* and Tianxi Cai
Companies: Harvard and Harvard
Keywords: Unsupervised signal recovery ; Binary outcomes ; Single index models ; Surrogate outcome and misclassification error ; Extreme sampling ; Sparsity and LASSO

We consider the regression of a binary outcome Y on a set of (possibly high dimensional) covariates X based on a large unlabeled data D with obs. only for X and additionally, a 'surrogate' S which, while not being strongly predictive of Y all over its support, can do so with high accuracy when it assumes extreme values. Such data arises naturally in settings where Y, unlike (X, S), is hard to obtain, a frequent scenario in modern studies involving large databases like electronic medical records (EMR), where an example of (Y, S) can be: (a disease outcome, its diagnostic codes). Assuming Y and S both follow flexible single index models vs. X, we show that under sparsity assumptions, we can recover the regression parameter of Y vs. X by simply fitting a least squares LASSO to the subset of D in the extreme sets of S with Y imputed using the surrogacy of S. We obtain sharp finite sample performance guarantees, with several interesting implications, for our estimator. We demonstrate the effectiveness of our approach through extensive simulations, where it is found to perform as well or better than supervised methods based on even 500 obs., followed by application to a real EMR dataset.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2016 program

Copyright © American Statistical Association