Abstract:
|
Cluster analysis is used in cancer research to discover molecular subgroups that inform subsequent laboratory investigations of the mechanisms of oncogenesis and define risk classification criteria for subsequent clinical trials. However, for any data set, there are a very large number of candidate cluster analysis methods (CCAMs) which are delineated by arbitrary choices of feature selection criteria, the number of features to use in the cluster analysis, distance metric, agglomeration algorithm, etc. Here, we propose the Dunn Index Bootstrap (DIBS) as a procedure that quantifies the statistical robustness of a large pool of CCAMs in terms of reproducibility of their results and distinctiveness of their assignment of subjects into clusters. To study its performance, DIBS was applied to several microarray gene expression, RNA-seq gene expression, and methylation data sets from various cancers where it was used to select the optimal CCAM among>4,000 potential candidates. In each of these examples, DIBS selected a CCAM that defined clinically relevant subgroups on the basis of biologically relevant features. The dibs procedure is being implemented as an R package.
|
ASA Meetings Department
732 North Washington Street, Alexandria, VA 22314
(703) 684-1221 • meetings@amstat.org
Copyright © American Statistical Association.