Dynamic treatment regimes (DTRs) adaptively prescribe treatments based on patient's evolving health status over multiple treatment stages. Data from sequential multiple assignment randomization trials (SMARTs) are recommended to be used for learning DTRs. However, due to re-randomization of same patients over multiple stages and prolonged follow-up, SMARTs are often difficult to implement and costly to manage, and patient adherence is always a concern in practice. In this work, we propose an alternative approach to learn optimal DTRs by synthesizing independent trials over different stages. We use a backward learning method to estimate optimal treatment decisions at a particular stage, where patient's future optimal outcome increment is estimated using data observed from independent trials with future stages' information. Under conditions, we show that the proposed method yields consistent estimation of the optimal DTRs and we obtain same learning rates as those from SMARTs. We conduct simulation studies to demonstrate advantage of the proposed method, and apply the developed method to learn optimal DTR by stage-wise synthesis of two randomized trials for major depressive disorder.