Machine learning for heterogeneous treatment effects has become a fast-growing research area in recent years. A popular and useful sub-field is policy learning, wherein we can use treatment effect estimates to make optimal intervention assignments. As researchers develop advanced nonparametric causal inference algorithms, there is a need for model-free tools to allow policy-makers to make principled, actionable decisions based on algorithm output. We introduce a set of methods that will allow practitioners to make valid statistical inferences in this setting, without imposing model restrictions. We propose specific hypotheses that correspond to real questions of interest about comparing policies, and give corresponding test statistics and asymptotic distributions, drawing from core statistical theory such as McNemar’s test. Additionally, we discuss visual diagnostics and global tests for any set of costs for which a personalized policy improves over a random or uniform policy, and demonstrate performance in a simulation study.