JSM Preliminary Online Program
This is the preliminary program for the 2006 Joint Statistical Meetings in Seattle, Washington.

The views expressed here are those of the individual authors
and not necessarily those of the ASA or its board, officers, or staff.


Back to main JSM 2006 Program page




Activity Number: 240
Type: Contributed
Date/Time: Tuesday, August 8, 2006 : 8:30 AM to 10:20 AM
Sponsor: IMS
Abstract - #307209
Title: On the Accuracy of Data Squashing
Author(s): Atsuyuki Kogure*+ and Masahiko Sagae
Companies: Keio University and Gifu University
Address: 5322 Endoh, Fujisawa, 252-8520, Japan
Keywords: data squashing ; massive data sets ; MLE ; binning ; local moments ; kernel method
Abstract:

The concept of "data squashing" was introduced by DuMouchel, Volinsky, Johnson, Cortes and Pregibon (1999) to alleviate the computational burden of the statistical analysis of massive data-sets. The key idea is to squash the original massive data set into a smaller representative sample and then apply the statistical procedure such as MLE to the squashed data. While many exemplifications have been reported to show the practicability of the method, not much research has been made into the accuracy of the method. In this talk we give theoretical arguments to evaluate the accuracy of the data squashing method. To construct the squashed data, DVJCP (1999) partitions the whole data range into bins and matches the local moments of the squashed data to that of the original data on each bin. We show this as a process of density estimation and investigate it in the context of the kernel method.


  • The address information is for the authors that have a + after their name.
  • Authors who are presenting talks have a * after their name.

Back to the full JSM 2006 program

JSM 2006 For information, contact jsm@amstat.org or phone (888) 231-3473. If you have questions about the Continuing Education program, please contact the Education Department.
Revised April, 2006