Dynamic treatment regimes (DTRs) are sequential decision rules that assign treatments to each patient adapting to their clinical course. However, existing literature typically accommodates each individual’s medical history, but overlooks a patient’s preferences. We propose a method that incorporates patient preferences through data augmentation into a tree-based reinforcement learning method to estimate optimal dynamic treatment regimes for multi-stage, multi-treatment settings. For each patient at each stage, we derive their posterior distribution of preferences given responses to a questionnaire, and then subsequently weight multiple outcomes with the estimated preferences to identify the optimal stage-wise personalized decision. For multiple stage situations, we grow an unsupervised decision tree at each stage and implement the algorithm recursively using backward induction. Our proposed method is robust, efficient, and leads to interpretable DTR estimation, as shown in simulation studies.