Keywords: A/B testing, experimental design, randomization, causal inference from experimentation, hypothesis testing, automated testing platform
Inferring causality in business setting is crucial for testing data science models as it verifies whether models trained on existing data are successful in improving key performance indicators real-time. Outside of ideal lab controlled experiments, time limitations often require that a data scientist derives conclusions based on incomplete or noisy data using out-of-the-box solutions which lead to biased conclusions. In this talk we will present easy to implement tools and mechanisms to limit or even eliminate that bias.
We will start this talk by outlining the basics of experimental design methods which address most common issues in testing data science models online such as interference / leakage between experimental units, attrition, one- and two-sided non-compliance. Next, we will present several methods that make causal inference possible in non-ideal conditions which include, for example, using covariates to rescale results, blocked / clustered / fully randomized group assignment, power analysis and calculating treatment effects by using formulas that account for uncertainty. We found that implementing them in our experimentation pipeline greatly improved the accuracy of our tests.
At a large company with multiple experiments running at the same time it is crucial to automate experimental design. While implementing your own solution is time consuming it pays off in the long run as the testing platform is then tailored to your company’s specific needs. In the second part of this talk we will discuss challenges faced by our team at Wayfair and an in-house product which automates experimental design and monitoring. We will put a special emphasis on testing machine learning models used to optimize paid search bidding as it poses challenges faced by every data scientist working in marketing domain.