Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 201 - Nonparametric Statistics Student Paper Competition Presentations
Type: Topic-Contributed
Date/Time: Tuesday, August 10, 2021 : 1:30 PM to 3:20 PM
Sponsor: Section on Nonparametric Statistics
Abstract #317158
Title: High-Dimensional Semi-Supervised Learning: In Search of Optimal Inference of the Mean
Author(s): Yuqian Zhang* and Jelena Bradic
Companies: UCSD and University of California, San Diego
Keywords: Double-robustness; Missing data; Model-lean inference; Coefficient of determination
Abstract:

The semi-supervised setting is widely present in today’s massive data repositories. A fundamental challenge therein lies in the disproportionality in the size of the fully observed data, n, and the data’s size with missing outcomes, with the latter being significantly bigger. An implicit understanding is that additional information ought to lead to an improved inference. However, in a semi-supervised setting, it is unclear to what extent this insight holds. We illustrate that a root-n inference concerning the outcomes mean is possible while only requiring a consistent estimation, possibly at a rate slower than root-n, of the outcome model. This solution especially suits models that naturally do not admit root-n consistency, such as high-dimensional, nonparametric, or semi-parametric models. The estimator uses a novel k-fold cross-fitting estimator and establishes connections between double robustness and semi-supervised learning.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program