Online Program

Saturday, February 20
CS19 Big Data Tools Sat, Feb 20, 9:15 AM - 10:45 AM
Topaz

Machine Learning Variable Selection for Credit Risk Modeling (303147)

*Jie Chen, Wells Fargo 
Agus Sudjianto, Wells Fargo 

Keywords: credit risk modeling, machine learning, variable selection, filter methods, embedded methods, R, H2O, Aster, diagnosis tool, model validation

In financial institutions, variable selection for credit risk modeling requires consistency in economic theory and business intuition. However, hundreds of irrelevant and redundant variables, various transformations, and large amounts of loan level consumer data make the variable selection challenging. Various powerful machine learning variable selection methods are being introduced as diagnosis tools to support model validation activities and independent testing. These methods are typically categorized into filter, wrapper and embedded methods. We will implement the filter and embedded methods with actual big consumer data through computing engine, H2O, and analytical database, Aster. Our diagnosis tools will cover the following variable selection related functionalities: importance ranking, interaction selection, nonlinearity detection, hierarchical clustering with mutual information based correlation distance, and the study of the covariate correlation impact on the prediction based on mutual information decomposition.