Abstract:
|
Supervised dimensionality reduction techniques, such as partial least squares and supervised principal components, are powerful tools for making predictions with a large number of variables (p >> n). The implicit squared error terms in the objectives, however, make it less attractive to non-Gaussian data, either in the covariates or the responses. Drawing on a connection between partial least squares and the Gaussian distribution, we show how partial least squares can be extended to other members of the exponential family -- similar to the generalized linear model -- for both the covariates and the responses. Unlike previous attempts, our extension gives latent variables which are easily interpretable as linear functions of the data and is computationally efficient. In particular, it does not require additional optimization for the scores of new observations and therefore predictions can be made in real time. We show the promise of our method by using a large set of binary medical diagnoses from electronic medical records to predict the probability of a patient at ICU developing a pressure ulcer.
|