Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 168 - SLDS Student Paper Awards
Type: Topic Contributed
Date/Time: Tuesday, August 4, 2020 : 10:00 AM to 11:50 AM
Sponsor: Section on Statistical Learning and Data Science
Abstract #309787
Title: Classification Accuracy as a Proxy for Two-Sample Testing
Author(s): Ilmun Kim* and Aaditya Ramdas and Aarti Singh and Larry Wasserman
Companies: Carnegie Mellon University and Carnegie Mellon University and Carnegie Mellon University and Carnegie Mellon University
Keywords: Classification accuracy; High-dimensional asymptotic; Linear discriminant analysis; Two-sample testing; Hotelling's test; Minimax optimality

When data analysts train a classifier and check if its accuracy is significantly different from chance, they are implicitly performing a two-sample test. We investigate the statistical properties of this flexible approach in the high-dimensional setting. We first present general conditions under which a classifier-based test is consistent, meaning that its power converges to one. To get a finer understanding of the rates of consistency, we study a specialized setting of distinguishing two Gaussians with different means and a common covariance. By focusing on Fisher's linear discriminant analysis (LDA) and its high-dimensional variants, we provide asymptotic but explicit power expressions of classifier-based tests and contrast them with corresponding Hotelling-type tests. Surprisingly, the expressions for their power match exactly in terms of the parameters of interest, and the LDA approach is only worse by a constant factor.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program