Abstract:
|
Cancer cell lines have frequently been used to link drug sensitivity with genomic profiles. The drug data in Garnett et al. (2012) screened 639 human tumor cell lines with 130 cancer drugs. To discover drug-sensitive biomarkers with a wide range of cancer drugs, the following three challenges exist: i) drug-sensitivity biomarkers cluster among cancer cell lines; ii) clusters can overlap (e.g. a cell line belongs to multiple clusters); iii) how will clustering be shaped by considering a number of drugs simultaneously. In this paper, we put forward a model-based overlapping clustering framework to address these challenges. Specifically a multivariate regression model with a latent overlapping cluster indicator variable is introduced. Revised finite mixture of multivariate regression (FMMR) model and EM algorithm is proposed to fit this new model. Penalized likelihood with a lasso or elastic-net penalty function is used to estimate model parameters meanwhile allowing variable selection. Simulation studies show excellent finite-sample performance, while also outperforming other methods. Analysis of the drug data has found complex overlapping clusters as well as cluster-wise drivers.
|