464 – Statistical Software and High-Performace Computing
Multivariate Wavelet Density Estimation for Streaming Data: A Parallel Programming Approach
Kyle Caudle
South Dakota School of Mines and Technology
Christer Karlsson
South Dakota School of Mines and Technology
Larry Pyeatt
South Dakota School of Mines and Technology
Data streams provide unique challenges that are not normally encountered during standard statistical analysis. Foremost is the fact that data is arriving at such a high rate that storing the data and analyzing later is no longer feasible. Methods must be in place to make sense of data on a near real-time basis. Typical methods involve streaming queries and model building. Density estimation is an essential tool used to make sense of data collected by large scale systems. Due to the curse of dimensionality, density estimation in higher dimensions becomes problematic. In this paper, we present a recursive method for constructing and updating an estimate of the non-stationary probability density function in a high dimensional input space (i.e. > 4D) using parallel programming. We draw samples from known 4 different multivariate densities in order to show the accuracy of our approach.