Abstract:
|
Propensity score (PS) methods have been widespread in reducing bias of treatment effect estimates. Machine learning methods, comparing to logistic regression, have shown good performance in estimating propensity score in single-level settings. In education, however, many data are naturally clustered, e.g., students nested within schools. Using Monte Carlo simulation, this study further examines the performance of leading machine learning methods (GBM, BART with random intercept) to estimate PS in multilevel observational studies as compared with parametric methods (multilevel fixed and random effects). Manipulated factors include the number of clusters, cluster sizes, intraclass correlations (ICCs), numbers of covariates, distributions of level-2 random errors, and the degrees of non-linearity. Estimated PSs are compared accuracy, degree of overfitting, and bias reduction via PS weighting. Conclusions are A) multilevel linear models have convergence issue when sample size is insufficient; B) if ICC is high, multilevel structure should be accounted for; and C) machine learning methods are preferable if the number of covariates is large or if the linearity assumption may be violated.
|