Activity Number:
|
113
- New Developments on Data Integration and Data Fusion
|
Type:
|
Topic Contributed
|
Date/Time:
|
Monday, July 29, 2019 : 8:30 AM to 10:20 AM
|
Sponsor:
|
Section on Statistical Learning and Data Science
|
Abstract #305023
|
|
Title:
|
Sparse Semiparametric Canonical Correlation Analysis for Data of Mixed Types
|
Author(s):
|
Irina Gaynanova* and Grace Yoon and Raymond J. Carroll
|
Companies:
|
Texas A&M Univeristy and Texas A&M University and Texas A & M University
|
Keywords:
|
BIC;
Gaussian copula;
Latent correlation;
zero inflation
|
Abstract:
|
Canonical correlation analysis investigates linear relationships between two sets of variables, but often works poorly on modern data sets due to high-dimensionality and mixed data types (continuous/binary/zero-inflated). We propose a new approach for sparse canonical correlation analysis of mixed data types that does not require explicit parametric assumptions. Our main contribution is the use of truncated latent Gaussian copula to model the data with excess ze- roes, which allows us to derive a rank-based estimator of latent correlation matrix without the estimation of marginal transformation functions. The resulting semiparametric sparse canon- ical correlation analysis method works well in high-dimensional settings as demonstrated via numerical studies, and application to the analysis of association between gene expression and micro RNA data of breast cancer patients.
|
Authors who are presenting talks have a * after their name.