Abstract:
|
Gene expression data have been extensively used for clustering samples. The clusters so generated can serve as the basis for disease subtype identification and risk stratification. With the small sample sizes of genetic profiling studies and noisy nature of gene expression data, clustering analysis results are often unsatisfactory. In the most recent studies, a prominent trend is to conduct multidimensional profiling, which collects data on gene expressions as well as their regulators on the same subjects. We develop a novel assisted clustering method, which effectively uses regulator information to improve clustering analysis using gene expression data. To account for the fact that not all gene expressions are informative, we propose a weighted strategy, where the weights are determined data-dependently and can discriminate informative gene expressions from noises. The proposed method is built on the NCut technique and effectively realized using a simulated annealing algorithm. Simulations demonstrate that it can well outperform multiple direct competitors. In the analysis of TCGA melanoma data, biologically sensible findings different from the alternatives are made.
|