JSM Preliminary Online Program
This is the preliminary program for the 2009 Joint Statistical Meetings in Washington, DC.

The views expressed here are those of the individual authors
and not necessarily those of the ASA or its board, officers, or staff.


Back to main JSM 2009 Program page




Activity Number: 521
Type: Contributed
Date/Time: Wednesday, August 5, 2009 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistical Learning and Data Mining
Abstract - #304418
Title: Distinct Counting with a Self-Learning Bitmap
Author(s): Aiyou Chen*+ and Jin Cao
Companies: Bell Labs, Alcatel-lucent and Bell Labs, Alcatel-lucent
Address: , , NJ, 07974,
Keywords: distinct counting ; streaming data ; adaptive sampling ; Markov chain ; stopping time ; bitmap
Abstract:

Estimating the number of distinct values is a fundamental problem in database that has attracted extensive research over the past two decades, due to its wide applications, especially in the Internet. However, the performance of existing algorithms in terms of relative estimation accuracy usually depends on unknown cardinalities. In this paper we address the following question: Can a distinct-counting algorithm have uniformly reliable performance, i.e. constant relative estimation errors for unknown cardinalities in a wide range, say from tens to millions? We propose a self-learning bitmap algorithm to answer the question. Both theoretical and experimental studies demonstrate that with given space and computing resources, the algorithm is not only uniformly reliable but more accurate than state-of-the-art algorithms such as LogLog counting under common practice settings.


  • The address information is for the authors that have a + after their name.
  • Authors who are presenting talks have a * after their name.

Back to the full JSM 2009 program


JSM 2009 For information, contact jsm@amstat.org or phone (888) 231-3473. If you have questions about the Continuing Education program, please contact the Education Department.
Revised September, 2008