Multiple Hypothesis Testing: Graphical methods for sequential rejective tests and tests for correlated sensitivity statistics with application to diagnostic imaging devices
*Berkman Sahiner, USFDA/CDRH/OSEL 

Keywords: Multiple hypothesis testing, graphical methods

Many clinical studies are designed to test for multiple objectives, which necessitates the use of a multiple-hypothesis testing framework, with the target of keeping the familywise error rate (FWER, or the probability of making one or more false rejections) less than a pre-specified significance level ?. Often, study objectives are organized in a hierarchy of families of primary, secondary, and tertiary hypotheses that are tested in a sequential order. For example, clinically we may want to test a secondary hypothesis only if a certain set of primary hypotheses are rejected. A variety of sequentially rejective test procedures, such as gatekeeping procedures based on Bonferroni adjustments, have been proposed in the literature to address multiple study objectives while controlling the FWER. However, when the number of families and the number of hypotheses under each family are large, the design of the procedure may become cumbersome, and it may be difficult to convey the underlying test strategy in terms of lengthy tables to non-statisticians. Graphical methods were introduced to simplify the statistical design for studies with multiply objectives, to ease communication among design team members and with others, and to be able to quickly explore different options in study design. In this approach, the elementary hypotheses are represented by a set of nodes with associated weights representing significance levels. The nodes are connected with directed edges, and the edge weights represent how the local significance levels change when the hypothesis at a node is rejected. In the first part of this presentation, we will explore how this technique works with multiple examples. In the second part, we will introduce a new technique that we have recently developed to improve the power of statistical tests that are designed to assess computer-aided diagnosis algorithms with multiple operating points. When multiple operating points are involved for an algorithm with an ordinal score, the sensitivities at the operating points are correlated. Although the correlation is not known a-priori, our study indicates that they can be estimated with enough reliability from the study data so that methods developed in the literature for testing correlated multiple hypotheses can be applied. Our simulation results indicate that using the proposed method, gains in power of up to 20% are possible compared to methods that ignore correlations.