There is a fast-growing literature on estimating optimal treatment regimes based on a key identifying condition of no unmeasured confounding. Because confounding by unmeasured factors cannot generally be ruled out with certainty in observational studies or randomized trials subject to noncompliance, we propose a general instrumental variable approach to learning optimal treatment regimes under endogeneity. Specifically, we provide sufficient conditions for the identification of both value function for a given regime and optimal regime with the aid of a binary IV, when no unmeasured confounding fails to hold. We also extend the proposed method to identify and estimate the optimal treatment regime among those who would comply to the assigned treatment under monotonicity. In this latter case, we establish the somewhat surprising result that the complier optimal regime can be consistently estimated without directly collecting compliance information and therefore without the complier average treatment effect itself being identified. Furthermore, we propose novel multiply robust classification-based estimators. Our approach is illustrated via simulations and a real data application.