Online Program Home
  My Program

Abstract Details

Activity Number: 74 - Developments in Epidemiologic Models
Type: Contributed
Date/Time: Sunday, July 30, 2017 : 4:00 PM to 5:50 PM
Sponsor: Section on Statistics in Epidemiology
Abstract #324180
Title: Missing Data in Canonical Correlation Analysis
Author(s): Emily Slade* and Brent Coull and Peter Kraft
Companies: and Harvard T.H. Chan School of Public Health and Harvard University
Keywords: Missing data ; Canonical correlation analysis ; Imputation ; Power
Abstract:

Canonical correlation analysis (CCA) provides a global test and measure of association between two multivariate sets of variables measured on the same individuals. In large multivariate settings, the proportion of subjects missing data on at least one variable can be high. Before performing CCA in practice, missing data has typically been handled by complete case analysis, unconditional mean imputation, or k-nearest neighbors approaches. For each of these methods as well as more sophisticated imputation methods, we examine bias of the first canonical correlation and power of a test of association between the two sets of variables. Even when the data are MCAR, bias is quite large in complete case analysis due to the strong link between sample size and bias in CCA. Surprisingly, tree-based imputation does not outperform naive single imputation methods. We present advances in performing multiple imputation, which is nontrivial due to the lack of a likelihood function in CCA. We offer recommendations for imputation in CCA based on simulated data with wide-ranging complexity, and we apply these methods to relate dietary variables to blood lipid levels.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2017 program

 
 
Copyright © American Statistical Association