High-dimensional nonparametric two sample testing deals with the question of consistently deciding if two D-dimensional distributions are different, given N samples from both, without making any parametric distributional assumptions, and D can increase with N. The Maximum Mean Discrepancy statistic with the Gaussian kernel (G-MMD), and the Energy Distance statistic with the Euclidean norm (E-ED) are very general tests, known to be consistent against all alternatives.
Our contribution is to explicitly characterize the power of the linear- and quadratic-time versions of G-MMD and E-ED, when the two distributions differ in their means. We find (a) a computation-power tradeoff (more computation yields direct statistical benefit) (b) power is (almost) independent of kernel bandwidth (c) E-ED and G-MMD have exactly the same power (d) E-ED and G-MMD enjoy a free lunch - they have the same power as specialized tests for detecting mean differences
This is the first explicit power derivation for any general nonparametric test in the high-dimensional setting, and the first proof of adaptivity of tests designed for general alternatives, the latter having important practical implications.
|
ASA Meetings Department
732 North Washington Street, Alexandria, VA 22314
(703) 684-1221 • meetings@amstat.org
Copyright © American Statistical Association.