Abstract:
|
In scientific studies involving analyses of multivariate data, two common questions that arise are whether the sample is exchangeable, meaning that the joint distribution of the sample is invariant to the ordering of the units; and whether the features can be grouped so that the groups are mutually independent. We propose a non-parametric approach that addresses these two questions. Our approach is conceptually simple, yet fast and flexible. It controls the Type I error across realistic scenarios, and handles data of arbitrary dimensions by leveraging large-sample asymptotics. In the exchangeability detection setting, through extensive simulations and a comparison against unsupervised tests of stratification based on random matrix theory, we find our approach compares favorably in various scenarios of interest. We apply our method to address genomic questions like identifying optimal LD blocks and identifying panmictic populations. We also apply our approach to post-clustering single-cell chromatin accessibility data and World Values Survey data, where we show how users can divide features into independent groups, which helps generate new scientific hypotheses about the features.
|