Abstract:
|
Through array-based and next-generation sequencing, 'omics datasets involve intrinsically different sizes and scales of high-throughput data, providing genome-wide, high-resolution information about the biology of lung cancer. A common goal is the identification of differential genomic signatures between samples that correspond to different treatments or biological conditions, e.g., treatment arms, tumor (sub)types, or cancer stages. We construct an encompassing class of nonparametric models called generalized Poisson-Dirichlet processes (g-PDPs) that are applicable to mixed, heterogeneously scaled datasets. Each platform can choose from diverse parametric and nonparametric models, which include finite mixtures, finite and infinite hidden Markov models, Dirichlet processes, and zero and first order PDPs that cover a broad range of data correlation structures. Simulation studies demonstrate that g-PDPs outperform many existing techniques in terms of accuracy of signature identification. The pathway analysis identified upstream regulators of many genes that are common genetic markers in multiple tumor cells.
|
ASA Meetings Department
732 North Washington Street, Alexandria, VA 22314
(703) 684-1221 • meetings@amstat.org
Copyright © American Statistical Association.