Activity Number:
|
410
|
Type:
|
Contributed
|
Date/Time:
|
Thursday, August 15, 2002 : 10:30 AM to 12:20 PM
|
Sponsor:
|
Biometrics Section*
|
Abstract - #301593 |
Title:
|
A Method to Identify Significant Clusters in Gene Expression Data
|
Author(s):
|
Katherine Pollard*+ and Mark van der Laan
|
Affiliation(s):
|
University of California, Berkeley and University of California, Berkeley
|
Address:
|
School of Public Health, Earl Warren Hall #7360, Berkeley, California, 94720-7360, USA
|
Keywords:
|
clustering ; silhouette ; homogeneity ; gene expression
|
Abstract:
|
Clustering algorithms have been widely applied to gene expression data. For both hierarchical and partitioning clustering algorithms, selecting the number of significant clusters is an important problem and many methods have been proposed. Existing methods for selecting the number of clusters tend to find only the global patterns in the data (e.g., the over and under expressed genes). We have noted the need for a better method in the gene expression context, where small, biologically meaningful clusters can be difficult to identify. We define a new criteria, Mean Split Silhouette (MSS), which is a measure of cluster heterogeneity. We propose to choose the number of clusters as the minimizer of MSS. In this way, the number of significant clusters is defined as that which produces the most homogeneous clusters. The power of this method compared to existing methods is demonstrated on simulated and real microarray data. The minimum MSS method is an example of a general approach that can be applied to any clustering routine with any global criteria. The key idea is to assess each cluster separately using a measure of heterogeneity.
|
- The address information is for the authors that have a + after their name.
- Authors who are presenting talks have a * after their name.
Back to the full JSM 2002 program |