Activity Number:
|
52
|
Type:
|
Contributed
|
Date/Time:
|
Sunday, August 11, 2002 : 4:00 PM to 5:50 PM
|
Sponsor:
|
Section on Statistical Computing*
|
Abstract - #301152 |
Title:
|
Low-Storage, Sequential, Simultaneous Estimation of Multiple Quantiles for Massive Datasets
|
Author(s):
|
James McDermott*+ and John Liechty and Dennis Lin
|
Affiliation(s):
|
Pennsylvania State University and Pennsylvania State University and Pennsylvania State University
|
Address:
|
325 Thomas Bldg, University Park, Pennsylvania, 16802, USA
|
Keywords:
|
quantile estimation ; sequential methods ; massive datasets ; low-storage ; datamining ; data mining
|
Abstract:
|
We propose a low-storage, single-pass, sequential method for simultaneous estimation of multiple quantiles for massive datasets. The proposed method uses estimated ranks, assigned weights, and a scoring function that determines the most attractive candidate data points for estimates of the quantiles. The method uses a small fixed amount of storage and its computation time is O(n). Asymptotically, the proposed estimates are as accurate as the sample quantiles. We compare the proposed method's performance with that of the empirical distribution function through simulation study.
|