Abstract:
|
A key goal in flow cytometry data analysis is to identify cell types associated with disease. To this end, we compare case and control distributions of cell properties and identify the locations where these two distributions differ. For a particular property, we grid the joint range of the distributions into bins and test the null hypothesis that these two distributions differ in each bin. Very few bins are expected to be non-nulls. To identify these bins, we present a hierarchical multiple testing framework to control the overall false discovery rate (FDR). It involves multiple testing in multiple layers. Layer 1 uses the smallest bins to find strong differences over narrow regions. In higher layers, we merge adjacent bins declared as nulls in lower layers to identify weak differences over wide regions. Hypotheses are nested across layers, so that bins identified in higher layers are mapped to layer 1. We proved that under mild conditions, our method can asymptotically control the overall FDR. Extensive simulations show that it outperforms competing methods. We then apply it to identify cells involved in frequency differences between two blood samples from a single patient.
|