Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 210 - SLDS CSpeed 3
Type: Contributed
Date/Time: Tuesday, August 10, 2021 : 1:30 PM to 3:20 PM
Sponsor: Social Statistics Section
Abstract #318929
Title: Performance of Parametric Versus Machine Learning Methods for Estimating Propensity Score with Multilevel Data: A Monte Carlo Study
Author(s): Tianyang Zhang* and Bryan Keller
Companies: Teachers College, Columbia University and Teachers College, Columbia University
Keywords: Propensity Score; Multilevel Data; Machine Learning
Abstract:

Propensity score (PS) methods have been widespread in reducing bias of treatment effect estimates. Machine learning methods, comparing to logistic regression, have shown good performance in estimating propensity score in single-level settings. In education, however, many data are naturally clustered, e.g., students nested within schools. Using Monte Carlo simulation, this study further examines the performance of leading machine learning methods (GBM, BART with random intercept) to estimate PS in multilevel observational studies as compared with parametric methods (multilevel fixed and random effects). Manipulated factors include the number of clusters, cluster sizes, intraclass correlations (ICCs), numbers of covariates, distributions of level-2 random errors, and the degrees of non-linearity. Estimated PSs are compared accuracy, degree of overfitting, and bias reduction via PS weighting. Conclusions are A) multilevel linear models have convergence issue when sample size is insufficient; B) if ICC is high, multilevel structure should be accounted for; and C) machine learning methods are preferable if the number of covariates is large or if the linearity assumption may be violated.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program