Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 469 - Topics in Modern Predictive Modeling
Type: Contributed
Date/Time: Thursday, August 6, 2020 : 10:00 AM to 2:00 PM
Sponsor: IMS
Abstract #312803
Title: Predictive Modeling for Positive-Only Data with High-Dimensional Covariates
Author(s): Prabrisha Rakshit* and Zijian Guo and Jinbo Chen and Daniel Herman
Companies: Rutgers The State Univ of NJ and Rutgers The State University of NJ and University of Pennsylvania and University of Pennsylvania Perelman School of Medicine
Keywords: Prediction; High-dimensional; Positive-only data; Anchor variable; EHR phenotyping; Logistic regression
Abstract:

Labeling patients in electronic health records (EHRs) with respect to their statuses of having a disease or condition relies on prediction models using high-dimensional variables derived from EHR data. However, the most readily accessible annotations from EHRs are an incomplete set of gold-standard cases and non-gold standard cases. We analyze the "positive-only" data, where instead of observing the binary outcome directly, an anchor variable is observed as a proxy for the outcome. A positive anchor variable indicates presence of the phenotype, but a negative one is non-deterministic of the true phenotype status. We use high-dimensional logistic regression models for the golden-standard outcome and introduce a probability model between the outcome and the anchor variable. We propose a bias-corrected estimator for the case probability and establish asymptotic normality of the proposed estimator. Our method assumes sparsity conditions neither on the loading vector nor on the precision matrix of the random design. We validate our theoretical findings through simulations and real-data example.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program