Online Program

Return to main conference page

All Times EDT

Thursday, June 4
Data Visualization
Education
Education and Data Visualization Posters
Thu, Jun 4, 10:00 AM - 1:00 PM
TBD
 

CatViz for Visual Exploration of High-Dimensional Categorical Data Sets (308408)

James T Klosowski, AT&T Labs Research 
Eleftherios Koutsofios, AT&T Labs Research 
*Raif Rustamov, AT&T Labs Research 
Gordon Woodhull, AT&T Labs Research 

Keywords: variable relationships, categorical data

In this e-poster we introduce and demo a tool called CatViz (CATegorical data VIZualization). Categorical datasets with a large number of columns show up in a lot of contexts and CatViz facilitates exploring relationships between variables in such datasets. A common approach for understanding these datasets is to compute various correlation/similarity metrics (e.g. mutual information, Jaccard similarity) between the columns, and analyze the resulting pair-wise similarity matrix. However, for datasets with many columns this soon becomes unwieldy, and a more interactive/visual approach is preferable.

CatViz loads one or more graphs derived from the pair-wise variable similarity matrix, plus the original data and allows the user to explore them in a variety of ways. The primary visual display of CatViz is a graph showing relationships between data columns (i.e. variables). Each node in this graph corresponds to a data column; each edge is weighted by the corresponding entry in the similarity matrix; unless there are any constraints (e.g. causality), this graph is fully connected. Catviz makes it easy to filter edges based on the weight, and to visually distinguish between strong and weak relationships. The nodes of the graph can be clustered on the fly, giving a grouping of related columns together. Any node in this graph can be selected to display more detail about the node and its neighbors. In addition, any node can be selected to display a histogram of the values taken by the corresponding data column. When multiple nodes/columns are selected, their histograms are linked to provide in-depth probing of their relationships. CatViz also allows exploring higher order relationships of variables by providing a display of decision trees that predict a selected variable from all other variables. During the demo we will showcase applications of CatViz to a number of real-world datasets.