JSM 2004 - Toronto

Abstract #300394

This is the preliminary program for the 2004 Joint Statistical Meetings in Toronto, Canada. Currently included in this program is the "technical" program, schedule of invited, topic contributed, regular contributed and poster sessions; Continuing Education courses (August 7-10, 2004); and Committee and Business Meetings. This on-line program will be updated frequently to reflect the most current revisions.

To View the Program:
You may choose to view all activities of the program or just parts of it at any one time. All activities are arranged by date and time.

The views expressed here are those of the individual authors
and not necessarily those of the ASA or its board, officers, or staff.


Back to main JSM 2004 Program page



Activity Number: 384
Type: Contributed
Date/Time: Wednesday, August 11, 2004 : 2:00 PM to 3:50 PM
Sponsor: Biometrics Section
Abstract - #300394
Title: Cluster Analysis for Continuous Data with an Excess of Zeros
Author(s): Kimberly Siegmund*+
Companies: University of Southern California
Address: 1540 Alcazar Street, CHP-220, Los Angeles, CA, 90089,
Keywords: class discovery ; cluster analysis ; mixture models ; gene expression
Abstract:

Abnormal DNA methylation is typical in cancer. Since DNA methylation profiles vary across tumor types and subtypes, it is believed that clustering DNA methylation profiles may uncover novel disease subgroups. DNA methylation data obtained using the MethyLight technology is quantitative with an excess of zeros. For a region of DNA, MethyLight measures the occurrence of fully methylated alleles. For many samples it finds none while for others it finds variable levels. We introduce a Bernoulli-lognormal mixture model for clustering continuous data with an excess of zeros and compare it to standard model-based clustering methods for discrete data and for continuous data. In a simulation study we find the Bernoulli-lognormal mixture model has the lowest misclassification error rate compared to competing approaches. We illustrate the methods using DNA methylation profiles from a study of lung cancer cell lines. The Bernoulli-lognormal mixture model has the lowest cross-validation error for distinguishing lung cancer subtype (non-small cell vs. small cell) and allocates samples to classes with the lowest uncertainty.


  • The address information is for the authors that have a + after their name.
  • Authors who are presenting talks have a * after their name.

Back to the full JSM 2004 program

JSM 2004 For information, contact jsm@amstat.org or phone (888) 231-3473. If you have questions about the Continuing Education program, please contact the Education Department.
Revised March 2004