In recent years, the increasing availability of individual-level data and the advancement of machine learning algorithms have led to the explosion of methodological development for finding optimal individualized treatment rules (ITRs). These new tools are being applied in a variety of fields including business, medicine, and politics. However, there exist few methods that empirically evaluate the efficacy of ITRs. In particular, many of the existing ITR estimators are based on complex models and do not come with statistical uncertainty estimates. We consider common real-world settings, in which policy makers wish to predict the performance of a given ITR prior to its administration in a target population. We propose to use a randomized experiment for evaluating ITRs. Unlike the existing methods, the proposed methodology is based on Neyman's repeated sampling approach and does not require modeling assumptions. As a result, it is applicable to the empirical evaluation of ITRs derived from a wide range of statistical and machine learning models. We conduct a simulation and real-world study to demonstrate the accuracy of the proposed methodology in small samples.