Abstract:
|
We proposed to use multi-scale Shannon Profile, which is built on a cluster-based profile approach to quantify the tumor heterogeneity. We then devised a D statistic using the area under the Profile of Shannon Difference (PSD) between populations to quantify the heterogeneity difference. Multivariate adaptive regression splines model was used to detect the change points in PSD to determine the default number of phenotypic clones. In addition to individual comparisons, a combined score, Generalized Fisher Product Score (GF) was developed to prioritize biomarkers for further investigating heterogeneity. As proof of principle, we applied the proposed method on a published single-cell gene expression dataset. The results showed that the heterogeneity is statistically significantly higher in the samples with EGFR-mutation than that in EGFR wild type lung cancer tumors (D=-63.8, p< 0.001). In addition to identifying the same reported epithelial cell markers, using the GF score, we were able to identify novel population-specific biomarkers. It provides unique insights into emerging or disappearing clones between populations or states.
|