Abstract:
|
Large-scale A/B/n testing often yields a Multiple Hypothesis Testing (MHT) problem with significant dependencies. The dependencies typically cause the best-known methods to be overly conservative, which often results in the multiple testing issues being ignored or being dealt with in ad-hoc ways. The challenges of developing methods for MHT to be used across many different types of groups in a company include being able to handle these arbitrary dependencies in a manner that doesn't require knob-tuning and doesn't introduce excess conservatism, as well as maintaining rejection regions which are easily understood (e.g., p-value thresholds). We discuss methods developed at Lyft to control the False Discovery Rate (FDR) in such a manner. For the case where different subjects can be considered independent (such as user-split tests) we introduce a bootstrap method to estimate the false discoveries. For the case where no such assumption can be made (such as time-split tests), we introduce a permutation method to estimate the false discoveries. The accuracy and robustness of the methods are demonstrated on synthetic data, and if possible, demonstrated on actual rideshare data.
|