Abstract:
|
Consider a finite set of unknown probability distributions, f_1, f_2, ..., f_K, that serve as components of two potentially different mixture distributions: m_A = sum_k p_k f_k and m_B = sum_k q_k f_k, for unknown probability vectors p=(p_k) and q=(q_k). We observe labeled samples from these mixtures (i.e. random draws with both the components and values observed) and we aim to test the null hypothesis of equal marginal distributions (i.e. m_A = m_B). Curiously, we may have equal margins even when p differs from q. Motivated by a problem on the analysis of single-cell RNA-Seq data, we obtain a formula for the posterior probability of the null hypothesis when the components (1) live in a parametric family, and (2) for some unknown partition of {1,2,..,K} into J blocks, are identical within blocks and different between blocks. This formula anchors a powerful methodology for the determination of genes that exhibit distributional changes between different cellular conditions. Numerical experiments demonstrate improved operating characteristcs of this new methodology compared to gene-at-a-time inference procedures.
|