Abstract:
|
Imaging technologies relying on interpretations of trained professionals are typically evaluated in fully crossed multi-reader studies. In these settings, multi-sample bootstrap can be used to account for factor-specific variability and correlation, and to perform simple asymptotic inference for non-linear summary measures such as the AUC. Previous studies tend to disfavor the multi-reader bootstrap approach because of the substantial upward bias of the estimated variance. However, relative properties of the corresponding statistical inference are not known. We developed a general approach for structuring bias of the multi-reader bootstrap variance and proved that elimination of most of the bias leads to the current standard-of-practice test. Simulation study shows that the resulting nearly-unbiased variance estimator requires using t-distribution to control the type I error in statistical testing, which in some settings compromises power. Bootstrap's upward bias plays a protective role in Wald-type inference which enables higher power when between-reader variability is high. Additional gain can be achieved, without compromising type I error, by eliminating only a part of the bias.
|