Activity Number:
|
410
|
Type:
|
Contributed
|
Date/Time:
|
Thursday, August 15, 2002 : 10:30 AM to 12:20 PM
|
Sponsor:
|
Biometrics Section*
|
Abstract - #301432 |
Title:
|
Simultaneous Gene Clustering and Subset Selection for Classification via MDL
|
Author(s):
|
Rebecka Jornsten*+
|
Affiliation(s):
|
Rutgers University
|
Address:
|
501 Hill Center, Busch Campus, Piscataway, New Jersey, 08854, USA
|
Keywords:
|
clustering ; classification ; MDL ; gene expression
|
Abstract:
|
Gene clustering and sample classification are two important tasks in the analysis of gene expression data. Most current approaches treat these tasks in a separate or directional manner. We view the clustering of genes, and subsequent selection of clusters that can discriminate between classes, as one model selection problem. We use Rissanen's minimum description length principle (MDL), which formalizes Occam's razor. The description length consists of two parts. The first part describes the gene clustering and is based on a Gaussian mixture model. The second part describes the sample class labels. By minimizing the combined coding rate of the two parts, we allow the clustering to be influenced by the classification. Similarly, the subset selection is affected by the clustering. For the first time, a description length for explanatory variables is included in an MDL selection criterion. We apply our MDL selection criterion for simultaneous gene clustering and subset selection to several gene expression data sets. We obtain competitive test error rates. We also present results from an extensive simulation study.
|
- The address information is for the authors that have a + after their name.
- Authors who are presenting talks have a * after their name.
Back to the full JSM 2002 program |