Dynamic Treatment Regimes (DTR) are a sequence of decisions made over time, e.g. medical treatment is dynamically adjusted to the patient’s responses. Utilizing existing electronic medical records collected from clinic, we search for optimal DTR that maximize a desirable outcome for as many patients as possible. Q-learning and A-learning are two reinforcement learning algorithms proposed for finding the optimal DTR. Here, we compare a novel application of Bayesian Additive Regression Tree(BART) to Q-and A-learning for K-stage DTR. To assess how different DTR algorithms perform in correctly identifying the optimal DTR, a revised R[d^(opt)] is proposed. The revised R[d^(opt)] improves the original formula (Schulte et al. , 2012) by ensuring its value always falls within the 0-1 range. Different DTR methods for the two-stage setting are compared using R[d^(opt)] under the potential model misspecification setup. Finally, we applied the different methods to real-world data derived from the US Cystic Fibrosis Patient Registry.