|
Activity Number:
|
521
|
|
Type:
|
Contributed
|
|
Date/Time:
|
Wednesday, August 5, 2009 : 2:00 PM to 3:50 PM
|
|
Sponsor:
|
Section on Statistical Learning and Data Mining
|
| Abstract - #304418 |
|
Title:
|
Distinct Counting with a Self-Learning Bitmap
|
|
Author(s):
|
Aiyou Chen*+ and Jin Cao
|
|
Companies:
|
Bell Labs, Alcatel-lucent and Bell Labs, Alcatel-lucent
|
|
Address:
|
, , NJ, 07974,
|
|
Keywords:
|
distinct counting ; streaming data ; adaptive sampling ; Markov chain ; stopping time ; bitmap
|
|
Abstract:
|
Estimating the number of distinct values is a fundamental problem in database that has attracted extensive research over the past two decades, due to its wide applications, especially in the Internet. However, the performance of existing algorithms in terms of relative estimation accuracy usually depends on unknown cardinalities. In this paper we address the following question: Can a distinct-counting algorithm have uniformly reliable performance, i.e. constant relative estimation errors for unknown cardinalities in a wide range, say from tens to millions? We propose a self-learning bitmap algorithm to answer the question. Both theoretical and experimental studies demonstrate that with given space and computing resources, the algorithm is not only uniformly reliable but more accurate than state-of-the-art algorithms such as LogLog counting under common practice settings.
|
- The address information is for the authors that have a + after their name.
- Authors who are presenting talks have a * after their name.
Back to the full JSM 2009 program |