Abstract:
|
Two large-scale pharmacogenomic studies published in 2012, the Cancer Genome Project (CGP) and Cancer Cell Line Encyclopedia, have 15 drugs in common as well 698 cell lines, originating from 23 tissue types. It was found that the intersection genomic data are well consistent between studies, but the measured drug response data are highly discordant. Issues then arise on independent data validation and trustworthy information pick-up being translated to in vivo response. We develop a data shared approach for the generalized finite mixture of multivariate regression (FMMR) model to learn the two data sets. The generalized FMMR model accommodates overlapping clustering, which generates more sophisticated clustering for each data. Simulation studies show that the data shared approach provides improved prediction and accurately estimates the disparities in coefficients between data. The new approach provides a way of validation for the two discordant high throughput drug screening data and produces estimates for both shared structure and variations between the data.
|