Online controlled experiments (e.g., A/B tests) are becoming the gold standard for evaluating improvements in software systems. From front-end user-interface changes to backend algorithms, from search engines (e.g., Google, Bing, Yahoo!) to retailers (e.g., Amazon, eBay, Etsy) to social networking services (e.g., Facebook, LinkedIn, Twitter) to travel services (e.g., Expedia, Airbnb, Booking.com) to many startups, online controlled experiments are now utilized to make data-driven decisions at a wide range of companies.
A key problem in evaluation of treatments is that while experiments are run for a short period of time (usually 2 weeks) we want to understand the long term impact of a treatment on the business and users. Hence it becomes critical for us to understand if there are a novelty/primacy effects that would alter the size of the treatment effect over the long term.
I will discuss our current research in detection of novelty/primacy effects and also compare it with the common industry methods for the same.