JSM 2005 - Toronto

Abstract #302466

This is the preliminary program for the 2005 Joint Statistical Meetings in Minneapolis, Minnesota. Currently included in this program is the "technical" program, schedule of invited, topic contributed, regular contributed and poster sessions; Continuing Education courses (August 7-10, 2005); and Committee and Business Meetings. This on-line program will be updated frequently to reflect the most current revisions.

To View the Program:
You may choose to view all activities of the program or just parts of it at any one time. All activities are arranged by date and time.



The views expressed here are those of the individual authors
and not necessarily those of the ASA or its board, officers, or staff.


The Program has labeled the meeting rooms with "letters" preceding the name of the room, designating in which facility the room is located:

Minneapolis Convention Center = “MCC” Hilton Minneapolis Hotel = “H” Hyatt Regency Minneapolis = “HY”

Back to main JSM 2005 Program page



Legend: = Applied Session, = Theme Session, = Presenter
Activity Number: 299
Type: Invited
Date/Time: Tuesday, August 9, 2005 : 2:00 PM to 3:50 PM
Sponsor: Section on Nonparametric Statistics
Abstract - #302466
Title: Small Sample pmf Estimation and an Application to Language Modeling
Author(s): Bruno M. Jedynak*+
Companies: Johns Hopkins University
Address: Center for Imaging Science, Clark 302b, Baltimore, MD, 21218-2686,
Keywords: small sample ; density estimation ; Kullback ; language ; Zipf
Abstract:

In this paper, we propose a new method for estimating the probability mass function (pmf) of a discrete and finite random variable from a small sample. We focus on the observed counts---the number of times each value appears in the sample---and define the Maximum Likelihood Set (MLS) as the set of pmfs that put more mass on the observed counts than on any other set of counts possible for the same sample size. We also characterize the MLS in detail and show that the MLS is a ``diamond''-shaped subset of the probability simplex [0,1]^k bounded by at most k*(k-1) hyperplanes, where k is the number of possible values of the random variable. The MLS always contains the empirical distribution, as well as a family of Bayesian estimators based on a Dirichlet prior---particularly the well known Laplace estimator. Finally, we propose to select from the MLS the pmf that is ``closest'' to a fixed pmf that encodes prior information.


  • The address information is for the authors that have a + after their name.
  • Authors who are presenting talks have a * after their name.

Back to the full JSM 2005 program

JSM 2005 For information, contact jsm@amstat.org or phone (888) 231-3473. If you have questions about the Continuing Education program, please contact the Education Department.
Revised March 2005