JSM Preliminary Online Program
This is the preliminary program for the 2008 Joint Statistical Meetings in Denver, Colorado.

The views expressed here are those of the individual authors
and not necessarily those of the ASA or its board, officers, or staff.


Back to main JSM 2008 Program page




Activity Number: 336
Type: Invited
Date/Time: Wednesday, August 6, 2008 : 8:30 AM to 10:20 AM
Sponsor: IMS
Abstract - #300288
Title: Mixture Models for Document Clustering
Author(s): Edward J. Wegman*+
Companies: George Mason University
Address: 4400 University Drive, Fairfax, VA, 22030-4422,
Keywords:
Abstract:

Automatic clustering and classification of documents within corpora is a challenging task. Often, comparing word usage within the corpus, the so-called bag-of-words methodology, does this. In this talk, in addition to comparing word usage, we extract additional endogenous features of the documents and use a mixture model density estimate to localize the clusters.


  • The address information is for the authors that have a + after their name.
  • Authors who are presenting talks have a * after their name.

Back to the full JSM 2008 program


JSM 2008 For information, contact jsm@amstat.org or phone (888) 231-3473. If you have questions about the Continuing Education program, please contact the Education Department.
Revised September, 2008