Abstract:
|
In order to draw unbiased inferences under observational/quasi-experimental study designs common in educational research, matching methods are often applied to produce balanced treatment and control groups in terms of all background variables. Though propensity scores have been a key component in such applications, propensity score based matching methods are limited by model misspecifications, categorical variables with more than two levels, missing data, and nonlinear relationships. Random forest, averaging outcomes from many decision trees, is nonparametric in nature, straightforward to use, and capable of solving these issues. More importantly, the precision afforded by random forest may provide a more accurate and less model dependent estimate of the propensity score. The proximity matrix, a by-product of the random forest, is also shown as a natural distance measure between observations that may be used in matching. The proposed random forest based matching methods are illustrated on a student success study evaluating the efficacy of a supplemental instruction section in a large enrollment, bottleneck introductory statistics course.
|