Name: 2020 Joint Statistical Meetings
Start: 2020-08-02T07:00:00+00:00
End: 2020-08-06

Online Program Home
My Program

All Times EDT

Abstract Details

Activity Number:	190 - Session on Semi-Supervised and Unsupervised Learning
Type:	Contributed
Date/Time:	Tuesday, August 4, 2020 : 10:00 AM to 2:00 PM
Sponsor:	Biometrics Section
Abstract #309713
Title:	Semi-Supervised Learning of Dense High-Dimensional Parameter Through Predictive Surrogates
Author(s):	Jue Hou* and Zijian Guo and Tianxi Cai
Companies:	Harvard T.H. Chan School of Public Health and Rutgers The State University of NJ and Harvard University
Keywords:	dense coefficient; GLM; inference; EHR
Abstract:	Expectation is high for individualized risk prediction with the accumulated knowledge of the gene-disease association. While mega data linking the biobanks and EHR may have the enough sample size for learning thousands of genetic factors, retrieving the exact disease onset information requires labor-intensive chart review, which limits the available of such gold-standard label to only a fraction of the subjects. In this paper, we develop a semi-supervised learning (SSL) method for prediction and inference of individual risk when the number of covariates far exceeds the number of labels without the typical sparsity assumption on the prediction model. We leverages the predictive power from a few predictive surrogates on the missing labels so that we may predict individual risk involving parameters up to the full sample size. Through the one-step bias-correction with a novel cross-fitting scheme, we are able to produce honest SSL confidence interval for individual risk with arbitrary loading. We demonstrate the superiority of our SSL approach compared to existing supervised methods in simulation. We apply the method to the predict individual risk of obesity using SNP.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program

JSM 2020 Online Program

Abstract Details

American Statistical Association