Abstract:
|
Predictive analytics has been widely used in strategic marketing to uncover actionable information for a range of critical marketing decisions. In today's big data era, advanced technologies and digital processing are generating data in an unprecedented variety, volume and speed for businesses of all kind. Although the big data phenomenon has driven the development of a number of accommodating platforms and analytical algorithms, in the data mining world computational and analytical challenges associated with big data persist, and continue to call for effective techniques to reduce data dimensions.
In this study we concentrate on the reduction of the number of categories in categorical variables for the preparation of inputs for predictive modeling. SPSS MODELER has a Feature Selection node that by default filters out variables with number of categories as a percentage of records greater than 95%. This, however, provides little help because few variables have the number of categories that exceed 95% of the number of records. During our practices of predictive modeling in the telecommunication industry, we find supervised reduction of the number of categories effectively change some otherwise useless categorical variables into contributing predictors that significantly enhances model accuracy.
|