Abstract:
|
Canonical correlation analysis (CCA) seeks to characterize the relationship between two sets of variables X and Y by finding a linear combination of the variables in X and a linear combination of the variables in Y such that the correlation is maximized. The linear combinations of X and Y are called the canonical vectors and their correlation is called the canonical correlation. When the dimension of one or both datasets is large, one would additionally like to improve interpretation by finding sparse estimates of the canonical vectors, so that only a small subset of the variables in X and Y have nonzero coefficients. We propose a sparse CCA algorithm based on the proximal gradient method. First, we rewrite the CCA objective as minimization of a quadratic form. We then solve a regularized version of the optimization problem that yields sparse estimates of the canonical vectors. We consider the LASSO, adaptive LASSO, SCAD, and MC penalties, all of which result in a thresholding step as a consequence of their proximal operators. We compare the proposed method to methods based on alternating minimization.
|