Keywords: Causal Inference, Social Networks, Time Series, Longitudinal Study
Randomized experiments (A/B testings) have become the standard way for web-facing companies to evaluate new products, guide product development and prioritize ideas. There are times, however, when running an experiment is too complicated, costly, and time-consuming; this is where observational data can be used to quickly and cheaply obtain a reasonable estimate of the causal effect. Social networks, such as Facebook and LinkedIn, often employ strategies that encourage users to contribute (e.g., post, comment, like and send private messages) because of the belief that this will increase a user's visitation frequency, which in turn will maximize revenue. Measuring the effect of contributions on a user's propensity to return to the site and the impact on their neighborhood (i.e., the users they are connected to) can be done using experiments, but requires encouragement design (since we cannot directly force users to contribute) and developing non-standard experimentation infrastructure. In this setting, however, we have an abundance of historical data measuring a user's contribution and her visitation pattern, that can be used to estimate the causal effect. In this paper, we present an approach based on linear fixed effects models to estimate the contemporaneous (or instantaneous) causal effect of a user's contribution on her own and her neighbors' subsequent visits. By using temporal data and a carefully designed observational study, we can reduce the self-selection bias and obtain accurate measures of the causal effect. Using LinkedIn data for several million members, we further show that by using temporal data we can remove more of the bias than state of the art single time snapshot methods such as propensity score stratification, inverse propensity score weighting and doubly-robust methods.