Abstract:
|
Given the recent rise of large text data streams (e.g. Twitter, news feeds, and blogs), there exists an increased interest in algorithms for the analysis of such data. In particular, this analysis includes solutions to the problem of summarizing streaming content to identify topical trends within the stream. As clustering is the standard tool applied to data summarization task, numerous clustering-based approaches have been developed as solutions to this problem. Our approach focuses on the use of streaming data clustering algorithms based on the popular online-offline clustering paradigm. Specifically, we focus on the use of a novel density-based stream clustering approach to maintain a set of micro-clusters online whose current state, at any point in the stream, can be used to produce an offline macro-clustering (i.e. clusters of micro-clusters). The identification of trends is a natural extension of such an approach as the main function of the online maintenance phase is to perform such as operations as insertion, deletion, merging, and evolving of micro-clusters. Similarly, offline solutions can be compared by examination of their micro-cluster membership information.
|
ASA Meetings Department
732 North Washington Street, Alexandria, VA 22314
(703) 684-1221 • meetings@amstat.org
Copyright © American Statistical Association.