Keywords: bias correction methods, causal inference, sample selection bias, average treatment effect
The nature of the data available for retrospective observational studies involving causal inference and sample selection bias may often lead to a problem of imbalanced samples where the number of control units is much larger than the number of treated cases. To investigate the impact of sample imbalance on the accuracy of estimates of the average treatment effect obtained through different corrective methods, we conduct a Monte Carlo simulation study under a variety of settings for the sample size and class imbalance. We compare several widely used corrective methods including propensity score matching, the doubly robust matching estimator, the OLS regression, and the Heckman treatment effect model. To emphasize the importance of underlying model assumptions, we adopt the data generation scenarios that mirror different types of selection bias. We report the results of comparative model performance analysis under selection on observables, selection on unobservables, and in the presence of unmeasured confounding.