Abstract:
|
Canonical Correlation Analysis (CCA) is widely used for integrating multi-view data vectors so as to be unified low-dimensional representation. CCA assumes that data vectors have one-to-one associations across different views. However, several datasets such as NUS-WIDE (Chua et al. 2009), which is composed of images and their multiple tags, include many-to-many associations but not one-to-one. To utilize the complicated associations, Shimodaira (2016) extends CCA as Cross-Domain Matching Correlation Analysis (CDMCA). While some studies have already shown CDMCA's advantage by application experiments, its theoretical aspect is still less well understood. In this presentation, we give a theoretical guarantee of CDMCA. At first, we propose a novel probabilistic model that can explain data vectors with many-to-many associations. Then we apply CDMCA to data vectors and their associations came from the probabilistic model; we prove CDMCA's statistical consistency under some regular conditions. Our result indicates that CDMCA asymptotically recovers underlying low-dimensional data structure of multi-view data with complicated associations.
|