Abstract #301194

This is the preliminary program for the 2003 Joint Statistical Meetings in San Francisco, California. Currently included in this program is the "technical" program, schedule of invited, topic contributed, regular contributed and poster sessions; Continuing Education courses (August 2-5, 2003); and Committee and Business Meetings. This on-line program will be updated frequently to reflect the most current revisions.

To View the Program:
You may choose to view all activities of the program or just parts of it at any one time. All activities are arranged by date and time.

The views expressed here are those of the individual authors
and not necessarily those of the ASA or its board, officers, or staff.


Back to main JSM 2003 Program page



JSM 2003 Abstract #301194
Activity Number: 316
Type: Invited
Date/Time: Wednesday, August 6, 2003 : 8:30 AM to 10:20 AM
Sponsor: General Methodology
Abstract - #301194
Title: Hierarchical Bayesian Modeling of Text and Image Statistics
Author(s): David M. Blei*+ and Michael I. Jordan+
Companies: University of California, Berkeley and University of California, Berkeley
Address: 840 47th St. #4, Oakland, CA, 94608-3200, EECS Computer Science Division, Berkeley, CA, 94720-1776,
Keywords: graphical models ; empirical Bayes ; hierarchical models ; information retrieval ; image processing
Abstract:

We present a simple hierarchical Bayesian model for collections of multi-media documents. The model posits that documents or images are generated by choosing a random set of multinomial probabilities for a set of possible "topics," and then repeatedly generating words by sampling from the topic mixture. This model is intractable for exact probabilistic inference, but approximate posterior probabilities and marginal likelihoods can be obtained via fast variational methods. We also consider the problem of modeling annotated data, i.e., data with multiple types where the instance of one type (such as a caption) serves as a description of the other type (such as an image). We describe three hierarchical mixture models that are aimed at such problems, culminating in the "Corr-LDA model," a latent variable model that is effective at both joint clustering and automatic annotation. We conduct experiments testing these models using the Corel database of images and captions.


  • The address information is for the authors that have a + after their name.
  • Authors who are presenting talks have a * after their name.

Back to the full JSM 2003 program

JSM 2003 For information, contact meetings@amstat.org or phone (703) 684-1221. If you have questions about the Continuing Education program, please contact the Education Department.
Revised March 2003