245 – Speed Session #2: Topics in Biopharmaceutical Statistics and Programming and Analysis, Part 2
Information Value Statistic and Predictors for Logistic Regression
Bruce Lund
Data Mining Consultant
In preparing predictor variables for a binary logistic regression model it is common to collapse the levels of a nominal or discrete-valued predictor X to achieve parsimony while maintaining predictive power. Once the levels have been binned, the binned predictor is transformed to weight-of-evidence (WOE) coding for usage as a predictor in the model. In the first section of the paper an algorithm is given for collapsing the levels of a nominal or discrete-valued predictor X for predicting binary Y so that information value (IV) is maximized at each step in the collapsing. The algorithm allows the ordering of X to be maintained during the collapsing if X is ordinal. This algorithm is coded in SAS�. In the second section a process is given to simulate the probability distribution of IV under the assumption of no association between X and Y. Since, in practice, IV does not have a parametric probability distribution, this simulation provides a tool to reject non-significant IV.