Online Program

Return to main conference page
Friday, February 16
CS10 Propensity Scores and Resampling Methods Fri, Feb 16, 2:00 PM - 3:30 PM
Salons BC

Resampling Methods for Statistical Inference on Multi-Rater Kappas (303492)

*Chia-Ling Kuo, University of Connecticut Health 

Keywords: Cohen’s kappa, Fleiss’s kappa, Randolph’s kappa, free-marginal multi-rater kappa, kappa comparison

Kappa statistic is a measure to quantify the agreement of a categorical/ordinal outcome between raters. It is typically referred to as Cohen’s kappa or Fleiss’ kappa for two raters. Fleiss’ kappa is a generalization of Scott's pi statistic and can be used for multiple raters (n = 2). Light’s kappa and Hubert’s kappa are multi-rater versions of Cohen’s kappa. All of these kappas are susceptible to the “high-agreement-but-low-kappa paradoxes” (Feinstein and Cicchetti, 1990): 1) kappa is significantly lower than the proportion of agreement when the marginal totals of raters are unbalanced; 2) kappa is higher with an asymmetrical than a symmetrical imbalance. Randolph’s kappa is a multi-rater kappa that is not influenced by the paradoxes but has no theories to support for statistical inference. In my work, I focus on Randolph’s kappa and propose resampling methods to calculate confidence intervals for individual kappas and to compare two independent kappas, i.e., univariate and bivariate statistical inference. The resampling methods are validated by simulation and applied to real data for demonstration.