Online Program Home
My Program

Abstract Details

Activity Number: 353
Type: Contributed
Date/Time: Tuesday, August 2, 2016 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #318645 View Presentation
Title: Dimension-Reduction Techniques for Predictive Modeling
Author(s): Zhen Zhang* and Lei Zhang and Kendell Churchwell and James Veillette
Companies: C Spire and Mississippi State Department of Health and C Spire and C Spire
Keywords: dimension reduction ; predictive modeling ; supervised reduction ; algorithm ; attributes ; categorical variable

Predictive analytics has been widely used in strategic marketing to uncover actionable information for a range of critical marketing decisions. In today's big data era, advanced technologies and digital processing are generating data in an unprecedented variety, volume and speed for businesses of all kind. Although the big data phenomenon has driven the development of a number of accommodating platforms and analytical algorithms, in the data mining world computational and analytical challenges associated with big data persist, and continue to call for effective techniques to reduce data dimensions.

In this study we concentrate on the reduction of the number of categories in categorical variables for the preparation of inputs for predictive modeling. SPSS MODELER has a Feature Selection node that by default filters out variables with number of categories as a percentage of records greater than 95%. This, however, provides little help because few variables have the number of categories that exceed 95% of the number of records. During our practices of predictive modeling in the telecommunication industry, we find supervised reduction of the number of categories effectively change some otherwise useless categorical variables into contributing predictors that significantly enhances model accuracy.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2016 program

Copyright © American Statistical Association