Abstract:
|
In a high school championship diving competition, athletes are asked to perform 11 dives. Each of 5 judges gives each dive a rating from 0, for a failed dive, to 10 for a perfect dive. Round scores are determined by dropping the lowest and highest judge's rating, adding the remaining three ratings, and multiplying that sum by the degree of difficulty of the dive. The total score is determined by the sum of the round scores. Agreement in the context of multiple raters has been well explored in the literature. However, measuring agreement in the context of multiple raters, with multiple scores per rater, and with covariates influencing the ratings, has been much less explored. Data from a regional high school diving competition has all of these components. We examine various measures of interrater agreement from the literature, and discuss the shortcomings of each in the context of this data set.
|