JSM 2011 Online Program

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

Abstract Details

Activity Number: 646
Type: Topic Contributed
Date/Time: Thursday, August 4, 2011 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Computing
Abstract - #303267
Title: Incorporating a Self-Tuning Diffusion Map Framework Into Document Clustering
Author(s): Rebecca Nugent*+ and David Friedenberg
Companies: Carnegie Mellon University and Battelle Memorial Institute
Address: 5000 Forbes Avenue, Pittsburgh, PA, 15213, US
Keywords: document clustering ; diffusion maps ; self-tuning
Abstract:

Document clustering has been a rich research area, resulting in algorithms for grouping a fixed or streaming corpus when topic labels are unknown or pre-defined. Regardless of approach, most methods suffer from the need to analyze a very high-dimensional space of words in the corpus lexicon. This dimensionality is often reduced prior to analysis via some statistical threshold or common sense heuristic (e.g. removing words like "the"). It might be beneficial to remove this somewhat subjective decision. Diffusion maps are a powerful tool for identifying complicated structure and reducing dimensionality in a wide variety of applications. Representing the connectivity of a data set, diffusion maps project observations into a space in which standard methods can more easily model the structure. We explore the use of a flexible self-tuning diffusion map framework that incorporates local tuning parameters to capture group structure of varying density, if present, in a corpus of documents. Our work thus far has also shown a decrease in importance of the clustering method choice once in the reduced projected space. Results are shown for benchmark text classification data sets.


The address information is for the authors that have a + after their name.
Authors who are presenting talks have a * after their name.

Back to the full JSM 2011 program




2011 JSM Online Program Home

For information, contact jsm@amstat.org or phone (888) 231-3473.

If you have questions about the Continuing Education program, please contact the Education Department.