Abstract:
|
A/B testing is at the front line of technology innovation. In this talk we tell two stories about large-scale A/B testing at Microsoft. First, we consider the M error by Gelman and Carlin (2014), that is, the fact that making decisions based on statistically significant (e.g., p-val < 0.05) results will naturally induce an exaggeration of the actual underlying truth. This is closely related to the everlasting discussion on how to amend (or replace) the traditional null hypothesis significance testing framework, which is widely considered to be behind the recent replication crisis. Second, we consider how to sharpen randomization-based causal inference for factorial designs with binary outcomes, via the partial identification approach. Several simulated and real-life examples will be provided.
|