Keywords: hierarchical data, difference-in-difference, longitudinal data, observational studies
Health service researchers rely on difference-in-differences (DiD) studies to evaluate policy changes on costs, quality, and outcomes. This method estimates the differential change in outcomes (from before to after the intervention) in the affected (intervention) group versus a comparison group. The design controls for stable differences between the groups and for exogenous changes over time that affect both groups. However, causal conclusions are justified only when the comparison group provides a good counterfactual for how the intervention group would have evolved in the absence of intervention. Researchers may bolster their confidence in a comparison group’s validity by showing that it evolved similarly to the intervention group in the pre-period, but this is only helpful if trends are stable. We compare methods of evaluating and selecting comparison groups under a variety of data-generating processes motivated by observed patterns in real-world data. Then we simulate data and compare the performance of DiD estimates resulting from a variety of comparison group selection strategies.