JSM 2004 - Toronto

Abstract #301000

This is the preliminary program for the 2004 Joint Statistical Meetings in Toronto, Canada. Currently included in this program is the "technical" program, schedule of invited, topic contributed, regular contributed and poster sessions; Continuing Education courses (August 7-10, 2004); and Committee and Business Meetings. This on-line program will be updated frequently to reflect the most current revisions.

To View the Program:
You may choose to view all activities of the program or just parts of it at any one time. All activities are arranged by date and time.

The views expressed here are those of the individual authors
and not necessarily those of the ASA or its board, officers, or staff.


Back to main JSM 2004 Program page



Activity Number: 229
Type: Contributed
Date/Time: Tuesday, August 10, 2004 : 10:30 AM to 12:20 PM
Sponsor: Section on Survey Research Methods
Abstract - #301000
Title: Effects of Grouping Data on First and Second Distribution Moments
Author(s): Myron Katzoff*+ and Jay J. Kim and Joe Fred Gonzalez, Jr. and Lawrence H. Cox
Companies: National Center for Health Statistics and National Center for Health Statistics and Centers for Disease Control and Prevention and National Center for Health Statistics
Address: 3311 Toledo Rd., Hyattsville, MD, 20782,
Keywords: disclosure risk ; interval-data ; variance ; class mark ; data summarization ; midpoint
Abstract:

Data such as income are often grouped and released as interval data, considered to be one of the best ways of summarizing data which has disclosure risk implications as well. Class marks (midpoints) of intervals are then used to calculate the mean and variance of the grouped data. In most situations, using midpoints for every observation in the interval smoothes the data, thereby reducing the variance. It can be shown, as in analysis of variance, that, using midpoints, we lose the within-interval variance component if within-interval data have a uniform distribution. However, if distributions within some intervals are peaked or skewed, use of the midpoints of the interval data can result in higher variance estimates than would be obtained with the raw data. Moreover, for those data, the mean of the grouped data based on the use of midpoints is biased. If class (conditional) means are used for calculating overall mean and variance, the mean of the raw data can be recaptured and the variance will be lower. We report some initial results from our study of the impact of accepted practices for approximating moments with summarized data.


  • The address information is for the authors that have a + after their name.
  • Authors who are presenting talks have a * after their name.

Back to the full JSM 2004 program

JSM 2004 For information, contact jsm@amstat.org or phone (888) 231-3473. If you have questions about the Continuing Education program, please contact the Education Department.
Revised March 2004