Online Program

Classification accuracy of provider profiling methods based on Medicare claims

Rhondee Benjamin-Johnson, The Lewin Group 
Joshua J Fenton, University of California, Davis 
*Rebecca A Hubbard, Group Health Research Institute 
Tracy L Onega, Dartmouth Medical School 
Rebecca Smith-Bindman, University of California, San Francisco 
Weiwei Zhu, Group Health Research Institute 

Keywords: classification, empirical Bayes, Medicare, profiling, quality assessment

Assessing and reporting the comparative performance of health care providers is one of several approaches to meaningful quality measurement that is associated with improvements in provider performance. Data from Medicare, which insures 50 million Americans, are a potentially highly valuable source of information on provider performance. However, systematic evaluation of Medicare claims-based approaches to provider profiling is needed to ensure that error arising from both random variability and imperfect performance of claims-based measures does not result in erroneous estimates of provider performance. Motivated by the case of screening mammography quality improvement in which several claim-based measures for performance have been developed, we investigated claims-based approaches to classifying providers as failing to meet guideline performance thresholds. We investigated classification accuracy as a function of sensitivity and specificity of the claims-based algorithm, provider volume, and outcome prevalence. We used simulation studies to calculate sensitivity and specificity for identifying poor performing providers based on three statistical approaches: a method based on maximum likelihood point estimates alone, a maximum likelihood approach incorporating uncertainty via confidence intervals, and an empirical Bayes approach. In a population with mean outcome frequency of 8.5%, similar to the distribution for mammography recall rates, the Bayesian approach identified 72% of providers truly exceeding a performance threshold of 12%. However, in a population with mean outcome frequency of just 6 per 1000, similar to the distribution of breast cancer detection rates, this approach identified <1% of providers with true performance below 2 per 1000. Maximum likelihood-based approaches identified a higher proportion of poor performing providers but at the cost of erroneously flagging a higher proportion of adequate performers as poor performers. Although claims-based algorithms have the potential to accurately classify providers on a common outcome measure like mammography recall rate, they had poor accuracy for a rare outcome like breast cancer detection rate. For a rare outcome, even slightly imperfect claims-based algorithm performance leads to poor classification accuracy for profiling provider performance. Any claims-based profiling approach must be validated and carefully assessed before being considered for use in targeted quality improvement.