Online Program
Saturday, February 20 | |
CS19 Big Data Tools |
Sat, Feb 20, 9:15 AM - 10:45 AM
Topaz |
Machine Learning Variable Selection for Credit Risk Modeling (303147)*Jie Chen, Wells FargoAgus Sudjianto, Wells Fargo Keywords: credit risk modeling, machine learning, variable selection, filter methods, embedded methods, R, H2O, Aster, diagnosis tool, model validation In financial institutions, variable selection for credit risk modeling requires consistency in economic theory and business intuition. However, hundreds of irrelevant and redundant variables, various transformations, and large amounts of loan level consumer data make the variable selection challenging. Various powerful machine learning variable selection methods are being introduced as diagnosis tools to support model validation activities and independent testing. These methods are typically categorized into filter, wrapper and embedded methods. We will implement the filter and embedded methods with actual big consumer data through computing engine, H2O, and analytical database, Aster. Our diagnosis tools will cover the following variable selection related functionalities: importance ranking, interaction selection, nonlinearity detection, hierarchical clustering with mutual information based correlation distance, and the study of the covariate correlation impact on the prediction based on mutual information decomposition.
|