Abstract:
|
Modern multiplatform genomics datasets involve intrinsically different sizes and scales of high-throughput data that offer genome-wide, high-resolution information about the molecular processes underlying various types of cancers. One of the main analytical goals is the identification of differential genomic signatures among samples under different treatments or biological conditions. e.g., treatment arms, tumor (sub)types, or cancer stages. We construct a general class of hierarchical Bayesian nonparametric models based on Poisson Dirichlet Processes (called PDP-Seq) that are applicable to mixed, heterogeneously scaled datasets. Our construction, which encompasses diverse parametric and nonparametric models incorporating a wide range of data structures, can be chosen as the model for multiple hierarchies to borrow strength. In particular, first order processes are constructed to accommodate the correlation between neighboring (spatial) units, a common feature of such data. Simulation studies demonstrate that PDP-Seq outperforms existing inference techniques in terms of accuracy of genomic signature identification in high-throughput sequencing and methylation data.
|