Propensity score methods (PSM) are a commonly used approach for estimating average treatment effects in observational studies. Many of the methods have been shown to perform well and produce unbiased estimates of the average treatment effects under treated (ATT) in relatively simple studies. However, the performance of PSM with more complicated data sets is not completely clear. To address this issue, a factorial simulation design was developed to compare several PSM (both traditional and machine learning (ML) approaches). The factors were: 1) correlation of the covariates, 2) complexity of the propensity score model, 3) presence of unknown clustering, 4) number of covariates and 5) complexity of the outcome functions. The methods were compared in all combination of factors. ATT was estimated using matched data-set based on propensity score and regression with propensity score as an extra covariate. The comparison criteria were biases of estimated ATT. Results suggest ML produces a less biased ATT in the many of the complicated data sets simulated. However future work needs to be done for the most complicated, data sets as the estimated ATT was biased regardless of PSM.