Abstract:
|
There is a growing literature on the use of machine learning methods to estimate heterogeneous treatment effects across subpopulations in randomized experiments. However, each proposed method makes its own set of restrictive assumptions about the intervention's effects, the underlying data generating processes, and which subpopulation-level effects to explicitly estimate. Moreover, the majority of the literature provides no guidance on identifying the most significantly affected subpopulations. Therefore, we propose a new method for automatically identifying the subpopulation which experiences the largest distributional change as a result of the intervention, while making minimal assumptions about the intervention's effects or the underlying data generating process. We provide statistical bounds on its error and detection power, in addition to sufficient conditions for exact identification of the affected subpopulation. Finally, we validate the efficacy of the method by identifying heterogeneous treatment effects both in simulated datasets and in real-world data from several well-known program evaluation studies.
|