Keywords: Bankruptcy prediction, financial distress, undersampling, random forest,
Bankruptcy prediction has been widely studied in the field of accounting, finance, and business due to its critical impacts on society. In machine learning terms, the problem is of imbalance classification as the number of observations belonging to one class (non-bankruptcy) is much larger than the number in the other class (bankruptcy). Resampling methods - balancing the data before fitting it to a traditional classification model - offer a simple solution to the imbalance issue. In this study, we develop a new resampling technique using the idea of selectively under-sampling the class of non-bankruptcy companies under the “guidance” of random forest. We also propose to fit models by multiple-year data of companies instead of single-year data as in most literature. Our calculation on data of North American firms from 1997-2016 shows that our resampling technique works favorably against the most popular ones and that using multiple-year data could significantly improve the performance of bankruptcy predictive models.