JSM 2011 Online Program

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

Abstract Details

Activity Number: 79
Type: Contributed
Date/Time: Sunday, July 31, 2011 : 4:00 PM to 5:50 PM
Sponsor: Section on Statistical Consulting
Abstract - #302494
Title: Applying Regression to the Large and Unbalanced Data: Predicting Loan Status
Author(s): Shi Zhao*+
Companies: Xerox Innovation Group
Address: , , ,
Keywords: large data ; unbalanced data ; variable selection ; loan default
Abstract:

Loan management agency is able to proactively take actions if the trend of loan status is understood well. We adopt regression techniques to predict ordinal categorized loan status for this data mining problem. There are a couple of critical issues appearing recently that have not been studied well. One of them is the "big data" problem for which most of variables selection algorithms are unsuitable. We address it by implementing a parallel computing idea with RIC (risk inflation criterion) to avoid over-fitting or to keep the model interpretable. The other problem is the low frequency of loan default cases which easily causes false negatives much more costly than false positives. We pose more weight on defaults to improve predication accuracy, i.e., committing less false negatives. Determination of the weights is discussed. Also we will show how to use the model to generate early alarms for the default loans which is desirable for the loan management agency.


The address information is for the authors that have a + after their name.
Authors who are presenting talks have a * after their name.

Back to the full JSM 2011 program




2011 JSM Online Program Home

For information, contact jsm@amstat.org or phone (888) 231-3473.

If you have questions about the Continuing Education program, please contact the Education Department.