All Times ET
Keywords: local two-sample test, scan statistics, spatial statistics, high-dimensional two-sample test
Two-sample testing is a fundamental tool for scientific discovery. Yet, aside from concluding that two samples do not come from the same probability distribution, it is often of interest to characterize how the two distributions differ. Specifically, given labelled samples from two unknown densities $f_0$ and $f_1$, we consider the problem of localizing occurrences of the inequality $f_1 > f_0$ in the combined sample. We present a hypothesis testing framework for this task where localization is achieved using probability distributions that are defined by a random walk over the sample, and provide a tractable testing procedure utilizing a type of scan statistic. We derive non-asymptotic lower bounds on the power and accuracy of our test to detect whether $f_1>f_0$ in a local sense, characterize the test's consistency according to a certain problem-hardness parameter, and show that our test achieves the minimax rate for this parameter. We apply our method to single-cell RNA sequencing data from melanoma patients, demonstrating our method's potential for extracting novel scientific insights from the data.