Abstract:
|
In real-world predictive modeling applications, segmentation-based modeling techniques are often employed wherein data records are partitioned into segments, and separate predictive models are developed for each segment. It is common practice to build models sequentially by first segmenting the data (using, for example, unsupervised clustering algorithms) and then developing predictive models for the segments. This approach, however, ignores the strong influence that segmentation exerts on the predictive accuracies of the segment models. It would be preferable to optimize the segmentation so as to maximize overall predictive accuracy. This talk will discuss the IBM ProbE (TM) predictive modeling system that accomplishes this optimization by combining decision tree techniques with statistical modeling performed at the leaves of trees. At present, ProbE is able to perform stepwise linear regression and stepwise naive Bayes modeling at the leaves as trees are being constructed. The models that are produced have been found to perform as well as or better than hand-crafted models in both credit-risk assessment and targeted marketing applications.
|