Online Program

Return to main conference page
Saturday, May 19
Data Science
Time-based Models
Sat, May 19, 8:30 AM - 10:00 AM
Lake Fairfax B

Bankruptcy Prediction Using Selective Under-Sampling and Multiple-Year Data: A Study on North American Companies (304476)

*Son Nguyen, Bryant University 

Keywords: Bankruptcy prediction, financial distress, undersampling, random forest,

Bankruptcy prediction has been widely studied in the field of accounting, finance, and business due to its critical impacts on society. In machine learning terms, the problem is of imbalance classification as the number of observations belonging to one class (non-bankruptcy) is much larger than the number in the other class (bankruptcy). Resampling methods - balancing the data before fitting it to a traditional classification model - offer a simple solution to the imbalance issue. In this study, we develop a new resampling technique using the idea of selectively under-sampling the class of non-bankruptcy companies under the “guidance” of random forest. We also propose to fit models by multiple-year data of companies instead of single-year data as in most literature. Our calculation on data of North American firms from 1997-2016 shows that our resampling technique works favorably against the most popular ones and that using multiple-year data could significantly improve the performance of bankruptcy predictive models.