JSM 2011 Online Program

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

Abstract Details

Activity Number: 73
Type: Contributed
Date/Time: Sunday, July 31, 2011 : 4:00 PM to 5:50 PM
Sponsor: Section on Bayesian Statistical Science
Abstract - #301871
Title: Probabilistic Modeling of Text Data: A Review
Author(s): Shibasish Dasgupta*+
Companies: University of Florida at Gainesville
Address: Department of Statistics, Gainesville, FL, 32608, USA
Keywords: probabilistic inference ; text classification ; information retrieval ; document generalization

The management of large and growing collections of information is a central goal of modern statistical science. Data repositories of texts have become widely accessible, thus necessitating good methods of retrieval, organization, and exploration. Probabilistic models have been paramount to these tasks, used in settings such as text classification, information retrieval, text segmentation, information extraction etc.

These methods entail two stages: (1) Estimate or compute the posterior distribution of the parameters of a probabilistic model from a collection of text; & (2) For new documents, answer the question at hand (e.g., classification, retrieval) via probabilistic inference.

The goal of such modeling is document generalization. Given a new document, how is it similar to the previously seen documents? Where does it fit within them? What can one predict about it? Efficiently answering such questions is the focus of the statistical analysis of document collections. This talk will consider the problem of modeling text corpora.

The address information is for the authors that have a + after their name.
Authors who are presenting talks have a * after their name.

Back to the full JSM 2011 program

2011 JSM Online Program Home

For information, contact jsm@amstat.org or phone (888) 231-3473.

If you have questions about the Continuing Education program, please contact the Education Department.