444 – Contributed Oral Poster Presentations: Survey Research Methods Section
Detecting Novel Associations in Large Astrophysical Data Sets
Elizabeth Martinez-Gomez
Instituto Tecnologico Autonomo De Mexico
Mercedes Richards
Penn State University
Donald Richards
Penn State University
The distance correlation as a measure of dependence between collections of random variables was introduced by Szekely, Rizzo, and Bakirov (2007) and Szekely and Rizzo (2009). Unlike the classical Pearson correlation coefficient, the distance correlation is zero only in the case of independence. Moreover, the distance correlation applies to random vectors of any dimension, rather than to two-dimensional variables only, and it is now known to be capable of detecting nonlinear associations that are not detectable by the Pearson correlation coefficient. We apply the distance correlation to analyze high-dimensional, large-sample astrophysical databases on galaxy clusters, and we identify new associations and correlations between numerous astrophysical variables. For certain pairs of variables, we find that it is also possible to estimate the corresponding Pearson correlation coefficients from the distance correlation measures, with high accuracy. Indeed, the distance correlation has a clear tendency to resolve some high-dimensional data into highly concentrated "horseshoe" graphs, which make it easier to identify patterns in the data.