79 – Applications of Risk Analysis
Rolling Up Random Variables in Data Cubes
Phillip Yelland
Google Inc.
Data cubes, first developed in the context of on-line analytic processing (OLAP) applications for databases, have become increasingly widespread as a means of structuring data aggregations in other contexts. For example, increasing levels of aggregation in a data cube can be used to impose a hierarchical structure on sets of cross-categorized values, producing a summary description that takes advantage of commonalities within the cube categories. In this paper, we describe a novel technique for realizing such a hierarchical structure in a data cube containing discrete random variables. Using a generalization of an approach due to Chow and Liu, this technique produces a parsimonious approximation to the joint distribution of the variables in terms of the aggregation structure of the cube. The efficacy of the technique is illustrated using a real-life application that involves monitoring and reporting anomalies in Web traffic streams over time.