Propensity score matching (PSM) methods are a commonly used approach to reduce selection bias in estimating average treatment effects. In addition to traditional Logistic Regression (LR) models, recent machine learning tools are also applicable to estimate propensity scores. In this paper, we apply state-of-the-art machine learning techniques to improve propensity score estimation and benchmark their performance with the traditional LR models. We perform comprehensive simulations, implementing 8 scenarios that mimic typical characteristics of both simple and complicated data sets. The simulation design considers: 1) high-dimensional covariates, 2) correlation of the covariates, and 3) presence of unknown clustering. Performance of the models are evaluated by propensity score prediction accuracy, achieved covariate balance and mean squared error of treatment effect estimates. Our results suggest that machine learning based PSM led to superior reduction of bias in complicated datasets scenarios.