In diagnostic test evaluation, a neglected area of research is determining targets for specificity, sensitivity and other classification accuracy measures that imply that a test would be clinically useful in practice. We determine classification accuracy goals based on desired risk stratification, i.e., the post-test risk of having the condition being diagnosed (negative or positive predictive value) compared with the pre-test risk (prevalence). For a rare condition, classification accuracy goals are especially attractive because they may be evaluable in a case-control study enriched for the condition, alleviating the burden of conducting a large cross-sectional or cohort study to obtain enough subjects with the condition. We distinguish between performance goals for tests intended to rule-out, rule-in, or do both. We emphasize goals for negative and positive likelihood ratios because of their natural relationships with risk stratification, but independent goals for specificity and sensitivity are also developed. We consider not just standalone evaluations of a diagnostic test, but also comparative evaluations of superiority and non-inferiority of a test to a comparator, utilizing simple approximations to obtain comparative classification accuracy goals for risk differences and relative risks. For statistical inference, Wald confidence intervals are developed and applied to hypothetical data on a fetal fibronectin assay for ruling out risk of pre-term birth and hypothetical data on two human papillomavirus assays for detecting cervical cancer.