Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 440 - SLDS CSpeed 8
Type: Contributed
Date/Time: Thursday, August 12, 2021 : 4:00 PM to 5:50 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #318852
Title: Statistical Issues in Principal Component Score Estimation for Exponential Family PCA
Author(s): Ruochen Huang* and Yoonkyung Lee
Companies: Ohio State University and Ohio State University
Keywords: Bias correction; Dimension reduction; Exponential Family ; MLE; PCA
Abstract:

Most extensions of standard PCA to exponential family data are based on the assumption that the natural parameter matrix can be factorized into two low-rank matrices, namely, the principal component loadings matrix and scores matrix. The quality of component scores is of great importance for downstream tasks such as clustering and regression. When both loadings and scores are treated as fixed and unknown, they are often estimated jointly through the maximum likelihood. However, the joint estimation tends to inflate component scores in the magnitude and degrade the quality of scores when the data dimension is fixed. One possible source of this inflation is related to the bias of MLE in generalized linear model. We examine the extent of bias in component scores for logistic PCA with binary data. Through simulation studies we evaluate the effectiveness of some existing methods for bias reduction in MLE for logistic regression when the loadings are treated as known or estimated first from training data. In addition, we compare the quality of component scores from the joint estimation with an alternative formulation of logistic PCA through the projection of saturated logit parameters.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program