JSM 2004 - Toronto

Abstract #301202

This is the preliminary program for the 2004 Joint Statistical Meetings in Toronto, Canada. Currently included in this program is the "technical" program, schedule of invited, topic contributed, regular contributed and poster sessions; Continuing Education courses (August 7-10, 2004); and Committee and Business Meetings. This on-line program will be updated frequently to reflect the most current revisions.

To View the Program:
You may choose to view all activities of the program or just parts of it at any one time. All activities are arranged by date and time.

The views expressed here are those of the individual authors
and not necessarily those of the ASA or its board, officers, or staff.


Back to main JSM 2004 Program page



Activity Number: 415
Type: Contributed
Date/Time: Thursday, August 12, 2004 : 8:30 AM to 10:20 AM
Sponsor: Biometrics Section
Abstract - #301202
Title: Clustering Gene Expression Data Based on P Values
Author(s): Jun Li*+ and Rebecka J. Jornsten and Regina Y. Liu
Companies: Rutgers University and Rutgers University and Rutgers University
Address: 23689 Bpo Way, Piscataway, NJ, 08854,
Keywords: clustering ; measure of similarity ; pairwise p values ; combining p values
Abstract:

Clustering is an important task in many statistical analyses, including the analysis of microarray gene expression data, machine learning,and information retrieval. We use the analysis of microarray gene expression data to motivate and develop a new test-based clustering methodology that can reflect the exact experimental setup under which gene expression data are collected. We group genes by testing the equality or similarity of the condition-mean vectors and condition-variances. We use the p value from this test as a measure of similarity between two (or two groups of) genes, in which a small p value indicates that the mean-vectors and/or variance-vectors differ significantly. We view this measure of similarity as less arbitrary than the existing choices such as Euclidean, correlation etc. We cluster genes using all pairwise p values, building clusters in a bottom-up or top-down manner. For validation we use Fisher's method for combining p values to assign final p values to candidate clusters. Only clusters that satisfy the standard testing criteria are retained. To illustrate our clustering methodology, we use simulated and publicly available gene expression datasets.


  • The address information is for the authors that have a + after their name.
  • Authors who are presenting talks have a * after their name.

Back to the full JSM 2004 program

JSM 2004 For information, contact jsm@amstat.org or phone (888) 231-3473. If you have questions about the Continuing Education program, please contact the Education Department.
Revised March 2004