423 – Contributed Oral Poster Presentations: Social Statistics Section
Model-Based and Scalable Relationship Discovery in Business Analytics
Jane Chu
IBM
Sier Han
IBM
Jing Shyr
IBM
Damir Spisic
IBM SPSS Predictive Analytics
Xue Ying Zhang
This paper proposes a model-based and scalable system to produce a series of tabular reports that illustrate the important measure-dimension (or continuous target-categorical predictor) relationships and exhibit strong dimension interactions. The analysis for each report is based on a linear regression model and the interaction detection is an ANOVA test. However, only basic statistics are used to evaluate model accuracy and conduct ANOVA tests without actually fitting models. For datasets with the large number of categorical predictors, it becomes prohibitive to generate and analyze all possible tables. So a structured and scalable search process is applied: all the tables with single dimension are considered first; the tables with two and higher dimensions are considered selectively based on the analysis of the corresponding tables of lower dimension. This ensures that the computational effort needed for generating and analyzing the tables is limited. Furthermore, the top selected tables are further analyzed by detecting any cells, which correspond to the category combinations of dimensions in the table, with high contributions to the significance of the interaction effect.