Abstract:
|
Motivated by the pressing needs for capturing complex but interpretable variable relationships in scientific research, here we generalize the squared Pearson correlation, i.e., the R square, to capture a mixture of linear dependences between two real-valued random variables, with or without an index variable that specifies the line memberships. For the computation of the generalized R square measure from a sample without line-membership specification, we develop a K-lines clustering algorithm. We also define the population-level generalized R square measures and derive the asymptotic distributions of the sample-level measures to enable efficient statistical inference. Numerical simulation provides numerical verification of the theoretical results, as well as a power comparison of the generalized R square measures with widely-used association measures. Gene expression data analysis demonstrates the effectiveness of the generalized R square measures in capturing interpretable gene-gene relationships missed by other measures. We implement the estimation and inference procedures in an R package gR2.
|