Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 400 - Breiman Award Lectures
Type: Invited
Date/Time: Thursday, August 12, 2021 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #314483
Title: Hypothesis Testing After Hypothesis Generation
Author(s): Daniela Witten* and Jacob Bien and Lucy Gao and Anna Neufeld
Companies: University of Washington and University of Southern California and University of Waterloo and University of Washington
Keywords:
Abstract:

As datasets continue to grow in size, in many settings the focus of data collection has shifted away from testing pre-specified hypotheses, and towards hypothesis generation. Researchers are often interested in performing an exploratory data analysis in order to generate hypotheses, and then testing those hypotheses on the same data; I will refer to this as 'double dipping'. Unfortunately, double dipping can lead to highly-inflated Type 1 errors. In this talk, I will consider the special cases of hierarchical clustering and CART decision trees. First, I will show that sample-splitting does not solve the 'double dipping' problem for clustering. Then, I will propose a test for a difference in means between estimated clusters that accounts for the cluster estimation process, using a selective inference framework. Finally, I will show that a similar approach can be applied to test hypotheses related to a fitted CART decision tree. This is joint work with Lucy Gao (University of Waterloo), Anna Neufeld (University of Washington), and Jacob Bien (University of Southern California).


Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program