Abstract #301339


The views expressed here are those of the individual authors
and not necessarily those of the ASA or its board, officers, or staff.


Back to main JSM 2002 Program page



JSM 2002 Abstract #301339
Activity Number: 387
Type: Contributed
Date/Time: Thursday, August 15, 2002 : 8:30 AM to 10:20 AM
Sponsor: Biometrics Section*
Abstract - #301339
Title: Estimating Multinomial Probabilities in SAGE (Serial Analysis of Gene Expression ) Data
Author(s): Jeffrey Morris*+ and Keith Baggerly and Kevin Coombes
Affiliation(s): U. T. M. D. Anderson Cancer Center and U. T. M. D. Anderson Cancer Center and U. T. M. D. Anderson Cancer Center
Address: 1515 Holcombe Blvd, Houston, Texas, 77030-4009, USA
Keywords: Bioinformatics ; Multinomial Distribution ; SAGE ; Mixture Distributions ; Bayesian Methods
Abstract:

We are interested in estimating the relative frequencies in a multinomial distribution when the "distribution" of relative frequencies is strongly skewed, so there are many scarce classes and a few abundant ones, and the sample size is not large relative to the number of classes. This setting is encountered in SAGE, where the multinomial variates are the counts of "distinct tags," ten base-pair sequences corresponding to mRNA transcripts, in a biological sample. Here, MLEs are not optimal for scarce classes, and standard Bayesian estimators have very high MSE for the abundant ones. We develop a new Bayesian estimation procedure using a Stratified Dirichlet prior, which partitions the classes into two strata, called scarce and abundant, each with its own multivariate prior distribution. Our estimators automatically constrain the multinomial probabilities to sum to one, and incorporate a form of nonlinear shrinkage, yielding estimates close to the MLEs for classes with large counts, but shrunken estimates for classes with small counts. We demonstrate by simulation from a SAGE-like population that our method has smaller IMSE than either the MLE or standard Bayesian estimator.


  • The address information is for the authors that have a + after their name.
  • Authors who are presenting talks have a * after their name.

Back to the full JSM 2002 program

JSM 2002

For information, contact meetings@amstat.org or phone (703) 684-1221.

If you have questions about the Continuing Education program, please contact the Education Department.

Revised March 2002