DOI QR코드

DOI QR Code

Application of Random Over Sampling Examples(ROSE) for an Effective Bankruptcy Prediction Model

효과적인 기업부도 예측모형을 위한 ROSE 표본추출기법의 적용

  • 안철휘 (국민대학교 비즈니스IT전문대학원) ;
  • 안현철 (국민대학교 비즈니스IT전문대학원)
  • Received : 2018.07.09
  • Accepted : 2018.08.21
  • Published : 2018.08.28

Abstract

If the frequency of a particular class is excessively higher than the frequency of other classes in the classification problem, data imbalance problems occur, which make machine learning distorted. Corporate bankruptcy prediction often suffers from data imbalance problems since the ratio of insolvent companies is generally very low, whereas the ratio of solvent companies is very high. To mitigate these problems, it is required to apply a proper sampling technique. Until now, oversampling techniques which adjust the class distribution of a data set by sampling minor class with replacement have popularly been used. However, they are a risk of overfitting. Under this background, this study proposes ROSE(Random Over Sampling Examples) technique which is proposed by Menardi and Torelli in 2014 for the effective corporate bankruptcy prediction. The ROSE technique creates new learning samples by synthesizing the samples for learning, so it leads to better prediction accuracy of the classifiers while avoiding the risk of overfitting. Specifically, our study proposes to combine the ROSE method with SVM(support vector machine), which is known as the best binary classifier. We applied the proposed method to a real-world bankruptcy prediction case of a Korean major bank, and compared its performance with other sampling techniques. Experimental results showed that ROSE contributed to the improvement of the prediction accuracy of SVM in bankruptcy prediction compared to other techniques, with statistical significance. These results shed a light on the fact that ROSE can be a good alternative for resolving data imbalance problems of the prediction problems in social science area other than bankruptcy prediction.

Keywords

Random Over Sampling Examples;Data Imbalance;Sampling;Bankruptcy Prediction

Acknowledgement

Supported by : 한국연구재단

References

  1. G. Menardi and N. Torelli, "Training and assessing classification rules with imbalanced data," Data Mining and Knowledge Discovery, Vol.28, No.1 pp.92-122, 2014. https://doi.org/10.1007/s10618-012-0295-5
  2. W. H. Beaver, "Financial ratios as predictors of failure, Journal of Accounting Research," Vol.4, pp.71-111, 1966. https://doi.org/10.2307/2490171
  3. E. I. Altman, "Financial ratios discriminant analysis and the prediction of corporate bankruptcy," The journal of finance, Vol.23, No.4, pp.589-609, 1968. https://doi.org/10.1111/j.1540-6261.1968.tb00843.x
  4. J. A. Ohlson, "Financial ratios and the probabilistic prediction of bankruptcy," Journal of accounting research, Vol.18, No.1, pp.109-131, 1980. https://doi.org/10.2307/2490395
  5. M. E. Zmijewski, "Methodological issues related to the estimation of financial distress prediction models," Journal of Accounting Research, Vol.22, pp.59-82, 1984. https://doi.org/10.2307/2490859
  6. R. O. Edmister, "An empirical test of financial ratio analysis for small business failure prediction," Journal of Financial and Quantitative Analysis, Vol.7, No.2, pp.1477-1493, 1972. https://doi.org/10.2307/2329929
  7. M. D. Odom and R. Sharda, "A neural network model for bankruptcy prediction. In Proceedings of the International Joint Conference on Neural networks," Vol.2, pp.163-168, 1990.
  8. K. Y. Tam and M. Y. Kiang, "Managerial applications of neural networks: the case of bank failure predictions," Management Science, Vol.38, No.7, pp.926-947, 1992. https://doi.org/10.1287/mnsc.38.7.926
  9. C. Serrano-Cinca, "Self-organizing neural networks for financial diagnosis," Decision Support Systems, Vol.17, No.3, pp.227-238, 1996. https://doi.org/10.1016/0167-9236(95)00033-X
  10. J. Yang and V. Honavar, "Feature subset selection using a genetic algorithm," IEEE Intelligent Systems and their Applications, Vol.13, No.2, pp.44-49, 1998. https://doi.org/10.1109/5254.671091
  11. 김경재, 한인구, "퍼지 신경망을 이용한 기업부도예측," 지능정보연구, 제7권, 제1호, pp.135-146, 2001.
  12. 이영찬, "인공신경망과 Support Vector Machine의 기업부도예측 성과 비교," 한국지능정보시스템학회 춘계학술대회논문집, pp.211-218, 2004.
  13. 강필성, 조성준, "데이터 불균형 해결을 위한 Under-Sampling 기반 앙상블 SVMs," 대한산업공학회 춘계공동학술대회 논문집, pp.291-298, 2006.
  14. 이재동, 이지형, "데이터 불균형 문제 해결을 위한 K-means Clustering 기반 SVM앙상블 기법," 한국정보과학회 한국컴퓨터종합학술대회 논문집, pp.297-799, 2014.
  15. 김태훈, 안현철, "A Hybrid Under-sampling Approach for Better Bankruptcy Prediction," 지능정보연구, 제21권, 제2호, pp.173-190, 2015. https://doi.org/10.13088/jiis.2015.21.2.173
  16. N. Japkowicz, "The Class Imbalance Problem:Significance and Strategies," In Proceedings of the International Conference on Artificial Intelligence, pp.111-114, 2000.
  17. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: Synthetic Minority Over-sampling Technique," Journal of Artificial Intelligence Research, Vol.16, pp.321-357, 2002. https://doi.org/10.1613/jair.953
  18. 이재동, 이지형, "데이터 불균형의 효과적인 학습을 위한 딥러닝 기법," 한국지능시스템학회 춘계학술대회 학술발표논문집, 제25권, 제1호, pp.113-114, 2015.
  19. G. E. Batista, R. C. Prati, and M. C. Monard, "A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data," ACM SIGKDD Explorations Newsletter, Vol.6, No.1, pp.20-29, 2004. https://doi.org/10.1145/1007730.1007735
  20. M. Kubat and S. Matwin, "Addressing the curse of imbalanced training sets: one-sided selection," Proceedings of the Fourteenth International Conference on Machine Learning, pp.179-186, 1997.
  21. N. Lunardon, G. Menardi, and N. Torelli, ROSE: A Package for Binary Imbalanced Learning, r-project.org, 2014.
  22. B. Efron and R. Tibshirani, An introduction to the bootstrap, Chapman and Hall, 1993.
  23. F. E. J. Tay and L. J. Cao, "Modified support vector machines in financial time series forecasting," Neurocomputing, Vol.48, pp.847-861, 2002. https://doi.org/10.1016/S0925-2312(01)00676-2