Individual treatment rule(ITR) estimation is a rapidly growing area in precision medicine; various parametric and semi-parametric methods have been proposed. Recently, the Outcome Weighted Learning (OWL) approach translated the problem into one of weighted classification, opening the door to non-parametric modeling. Improvements in performance have been achieved with OWL, but not when the distribution of potential outcome is unbalanced by optimal treatment groups. To address this shortcoming, we propose a Similarity-based Probability Weighted Learning (SPWL) method using a two-stage model. We first estimate the probability that one’s assigned treatment is optimal using a similarity weighted frequency, and then use the estimated probability to weigh the misclassification error. We replace the 0-1 loss by a hinge loss function in the second stage and solve the corresponding optimization problem using the quadratic programming algorithm. We investigate the finite sample performance of SPWL approach through extensive simulation studies and demonstrate good efficiency and robustness. In particular, for the unbalanced outcome scenario, SPWL improves prediction accuracy by more than 30\%.