JSM 2004 - Toronto

Abstract #301262

This is the preliminary program for the 2004 Joint Statistical Meetings in Toronto, Canada. Currently included in this program is the "technical" program, schedule of invited, topic contributed, regular contributed and poster sessions; Continuing Education courses (August 7-10, 2004); and Committee and Business Meetings. This on-line program will be updated frequently to reflect the most current revisions.

To View the Program:
You may choose to view all activities of the program or just parts of it at any one time. All activities are arranged by date and time.

The views expressed here are those of the individual authors
and not necessarily those of the ASA or its board, officers, or staff.


Back to main JSM 2004 Program page



Activity Number: 425
Type: Contributed
Date/Time: Thursday, August 12, 2004 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Computing
Abstract - #301262
Title: Improving Likelihood-based Data-squashing
Author(s): Yuchung J. Wang*+ and Xiaodong Sun
Companies: Rutgers University-Camden and Ask Jeeves, Inc.
Address: 370 Amitage Hall, Camden, NJ, 08102,
Keywords: data mining ; imputation ; uniform design ; clustering ; representative observation
Abstract:

Data-squashing is reducing a very large dataset, called the mother data, into a much smaller dataset, called a squashed data, with the requirement that statistical models/decisions derived from a squashed data are identical/similar to those derived from the mother data. A squashed data may be viewed as a collection of imputations. Madigan et al. (2003) proposed a likelihood-based squashing, which assumed a certain likelihood is the data-generating mechanism of the mother data. They artificially crate a likelihood profile for every observation in the mother data and subsequently cluster the mother data according to their profiles. Out of each cluster, an imputed observation will be constructed along with a weight. We address the issue: how to construct better profile. We shall illustrate that (a) the more the likelihood computed, the better the imputation, and (b) uniform design can improve upon factorial design in sampling the likelihood. We also propose an accelerated algorithm to perform clustering, because the k-mean clustering converges too slowly to be computationally feasible when the number of profiles is large.


  • The address information is for the authors that have a + after their name.
  • Authors who are presenting talks have a * after their name.

Back to the full JSM 2004 program

JSM 2004 For information, contact jsm@amstat.org or phone (888) 231-3473. If you have questions about the Continuing Education program, please contact the Education Department.
Revised March 2004