DOI QR코드

DOI QR Code

Corporate Default Prediction Model Using Deep Learning Time Series Algorithm, RNN and LSTM

딥러닝 시계열 알고리즘 적용한 기업부도예측모형 유용성 검증

  • Received : 2018.10.24
  • Accepted : 2018.11.26
  • Published : 2018.12.31

Abstract

In addition to stakeholders including managers, employees, creditors, and investors of bankrupt companies, corporate defaults have a ripple effect on the local and national economy. Before the Asian financial crisis, the Korean government only analyzed SMEs and tried to improve the forecasting power of a default prediction model, rather than developing various corporate default models. As a result, even large corporations called 'chaebol enterprises' become bankrupt. Even after that, the analysis of past corporate defaults has been focused on specific variables, and when the government restructured immediately after the global financial crisis, they only focused on certain main variables such as 'debt ratio'. A multifaceted study of corporate default prediction models is essential to ensure diverse interests, to avoid situations like the 'Lehman Brothers Case' of the global financial crisis, to avoid total collapse in a single moment. The key variables used in corporate defaults vary over time. This is confirmed by Beaver (1967, 1968) and Altman's (1968) analysis that Deakins'(1972) study shows that the major factors affecting corporate failure have changed. In Grice's (2001) study, the importance of predictive variables was also found through Zmijewski's (1984) and Ohlson's (1980) models. However, the studies that have been carried out in the past use static models. Most of them do not consider the changes that occur in the course of time. Therefore, in order to construct consistent prediction models, it is necessary to compensate the time-dependent bias by means of a time series analysis algorithm reflecting dynamic change. Based on the global financial crisis, which has had a significant impact on Korea, this study is conducted using 10 years of annual corporate data from 2000 to 2009. Data are divided into training data, validation data, and test data respectively, and are divided into 7, 2, and 1 years respectively. In order to construct a consistent bankruptcy model in the flow of time change, we first train a time series deep learning algorithm model using the data before the financial crisis (2000~2006). The parameter tuning of the existing model and the deep learning time series algorithm is conducted with validation data including the financial crisis period (2007~2008). As a result, we construct a model that shows similar pattern to the results of the learning data and shows excellent prediction power. After that, each bankruptcy prediction model is restructured by integrating the learning data and validation data again (2000 ~ 2008), applying the optimal parameters as in the previous validation. Finally, each corporate default prediction model is evaluated and compared using test data (2009) based on the trained models over nine years. Then, the usefulness of the corporate default prediction model based on the deep learning time series algorithm is proved. In addition, by adding the Lasso regression analysis to the existing methods (multiple discriminant analysis, logit model) which select the variables, it is proved that the deep learning time series algorithm model based on the three bundles of variables is useful for robust corporate default prediction. The definition of bankruptcy used is the same as that of Lee (2015). Independent variables include financial information such as financial ratios used in previous studies. Multivariate discriminant analysis, logit model, and Lasso regression model are used to select the optimal variable group. The influence of the Multivariate discriminant analysis model proposed by Altman (1968), the Logit model proposed by Ohlson (1980), the non-time series machine learning algorithms, and the deep learning time series algorithms are compared. In the case of corporate data, there are limitations of 'nonlinear variables', 'multi-collinearity' of variables, and 'lack of data'. While the logit model is nonlinear, the Lasso regression model solves the multi-collinearity problem, and the deep learning time series algorithm using the variable data generation method complements the lack of data. Big Data Technology, a leading technology in the future, is moving from simple human analysis, to automated AI analysis, and finally towards future intertwined AI applications. Although the study of the corporate default prediction model using the time series algorithm is still in its early stages, deep learning algorithm is much faster than regression analysis at corporate default prediction modeling. Also, it is more effective on prediction power. Through the Fourth Industrial Revolution, the current government and other overseas governments are working hard to integrate the system in everyday life of their nation and society. Yet the field of deep learning time series research for the financial industry is still insufficient. This is an initial study on deep learning time series algorithm analysis of corporate defaults. Therefore it is hoped that it will be used as a comparative analysis data for non-specialists who start a study combining financial data and deep learning time series algorithm.

본 연구는 경제적으로 국내에 큰 영향을 주었던 글로벌 금융위기를 기반으로 총 10년의 연간 기업데이터를 이용한다. 먼저 시대 변화 흐름에 일관성있는 부도 모형을 구축하는 것을 목표로 금융위기 이전(2000~2006년)의 데이터를 학습한다. 이후 매개 변수 튜닝을 통해 금융위기 기간이 포함(2007~2008년)된 유효성 검증 데이터가 학습데이터의 결과와 비슷한 양상을 보이고, 우수한 예측력을 가지도록 조정한다. 이후 학습 및 유효성 검증 데이터를 통합(2000~2008년)하여 유효성 검증 때와 같은 매개변수를 적용하여 모형을 재구축하고, 결과적으로 최종 학습된 모형을 기반으로 시험 데이터(2009년) 결과를 바탕으로 딥러닝 시계열 알고리즘 기반의 기업부도예측 모형이 유용함을 검증한다. 부도에 대한 정의는 Lee(2015) 연구와 동일하게 기업의 상장폐지 사유들 중 실적이 부진했던 경우를 부도로 선정한다. 독립변수의 경우, 기존 선행연구에서 이용되었던 재무비율 변수를 비롯한 기타 재무정보를 포함한다. 이후 최적의 변수군을 선별하는 방식으로 다변량 판별분석, 로짓 모형, 그리고 Lasso 회귀분석 모형을 이용한다. 기업부도예측 모형 방법론으로는 Altman(1968)이 제시했던 다중판별분석 모형, Ohlson(1980)이 제시한 로짓모형, 그리고 비시계열 기계학습 기반 부도예측모형과 딥러닝 시계열 알고리즘을 이용한다. 기업 데이터의 경우, '비선형적인 변수들', 변수들의 '다중 공선성 문제', 그리고 '데이터 수 부족'이란 한계점이 존재한다. 이에 로짓 모형은 '비선형성'을, Lasso 회귀분석 모형은 '다중 공선성 문제'를 해결하고, 가변적인 데이터 생성 방식을 이용하는 딥러닝 시계열 알고리즘을 접목함으로서 데이터 수가 부족한 점을 보완하여 연구를 진행한다. 현 정부를 비롯한 해외 정부에서는 4차 산업혁명을 통해 국가 및 사회의 시스템, 일상생활 전반을 아우르기 위해 힘쓰고 있다. 즉, 현재는 다양한 산업에 이르러 빅데이터를 이용한 딥러닝 연구가 활발히 진행되고 있지만, 금융 산업을 위한 연구분야는 아직도 미비하다. 따라서 이 연구는 기업 부도에 관하여 딥러닝 시계열 알고리즘 분석을 진행한 초기 논문으로서, 금융 데이터와 딥러닝 시계열 알고리즘을 접목한 연구를 시작하는 비 전공자에게 비교분석 자료로 쓰이기를 바란다.

Keywords

JJSHBB_2018_v24n4_1_f0001.png 이미지

Data Generation Method for Time-Series Deep Learning Algorithms

JJSHBB_2018_v24n4_1_f0002.png 이미지

RNN Model Diagram

JJSHBB_2018_v24n4_1_f0003.png 이미지

LSTM Model Diagram

JJSHBB_2018_v24n4_1_f0004.png 이미지

MLP Model Diagram

Original Models

JJSHBB_2018_v24n4_1_t0001.png 이미지

Non-Time-Series Supervised Algorithms

JJSHBB_2018_v24n4_1_t0002.png 이미지

Evaluation Index

JJSHBB_2018_v24n4_1_t0003.png 이미지

Confusion Matrix

JJSHBB_2018_v24n4_1_t0004.png 이미지

Advanced Evaluation Index

JJSHBB_2018_v24n4_1_t0005.png 이미지

Original data distribution (Left) and data distribution after preprocess (Right)

JJSHBB_2018_v24n4_1_t0006.png 이미지

Distribution Before/After Time-Series Data Generation

JJSHBB_2018_v24n4_1_t0007.png 이미지

Feature Selection Result from applying F-test and T-test

JJSHBB_2018_v24n4_1_t0008.png 이미지

Feature Selection by Correlation Analysis

JJSHBB_2018_v24n4_1_t0009.png 이미지

Feature Selection Result from applying Logit Model

JJSHBB_2018_v24n4_1_t0010.png 이미지

Feature Selection Result from applying Lasso Regression

JJSHBB_2018_v24n4_1_t0011.png 이미지

Feature Selection Summary from 3 models

JJSHBB_2018_v24n4_1_t0012.png 이미지

Train/validation result with features selected by Multiple Discriminant Analysis

JJSHBB_2018_v24n4_1_t0013.png 이미지

Train/validation result with features selected by Multiple Discriminant Analysis

JJSHBB_2018_v24n4_1_t0014.png 이미지

Test result with features selected by Multiple Discriminant Analysis

JJSHBB_2018_v24n4_1_t0015.png 이미지

Train/validation result with features selected by Logit model

JJSHBB_2018_v24n4_1_t0016.png 이미지

Test result with features selected by Logit model

JJSHBB_2018_v24n4_1_t0017.png 이미지

Train/validation result with features selected by Lasso Regression

JJSHBB_2018_v24n4_1_t0018.png 이미지

Test result with features selected by Lasso Regression

JJSHBB_2018_v24n4_1_t0019.png 이미지

Test result based on ROC AUC

JJSHBB_2018_v24n4_1_t0020.png 이미지

Test Result based on PR AUC

JJSHBB_2018_v24n4_1_t0021.png 이미지

References

  1. Addal, S., "Financial forecasting using machine learning", African Institute for Mathematical Science, (2016), 1-32.
  2. Ahn, S. M., and J. W. Park, "Corporate Bankruptcy Prediction Using Financial Ratios: Focused on the Korean Manufacturing Companies Audited by External Auditors", Korean Management Review, Vol.43, No.3, (2014), 639-669.
  3. Altman, E. I., "Financial Ratios, Discriminant Analysis and the Predication of Corporate Bankrupcy", Journal of Finance, Vol.23. No.4, (1968), 589-609. https://doi.org/10.1111/j.1540-6261.1968.tb00843.x
  4. Bae, J. K., "An Integrated Approach to Predict Corporate Bankruptcy with Voting Algorithms and Neural Networks", Korean Business Review, Vol.3, No.2, (2010), 79-101.
  5. Beaver, W. H., "Financial ratios as predictors of bankruptcy", Journal of Accounting Research, Supplement, (1966), 71-102.
  6. Deakin, E. B., "A Discriminant Analysis of Predictors of Business Failure", Journal of Accounting Research, Vol.10, No.1, (1972), 167-179. https://doi.org/10.2307/2490225
  7. Grice, J. S. and M. T. Dugan, "The Limitations of Bankruptcy Prediction Models: Some Cautions for the Researcher", Review of Quantitative Finance and Accounting, Vol.17, No.2, (2001), 151-166. https://doi.org/10.1023/A:1017973604789
  8. Hong, S. H. and K. S. Shin, "Using GA based Input Selection Method for Artificial Neural Network Modeling; Application to Bankruptcy Prediction". Journal of Intelligence and Information Systems, Vol.9, No.1, (2003), 227-249.
  9. Jo, N. O., H. J. Kim and K. S. Shin. "Bankrupcy Type Prediction Using A Hybrid Artificial Neural Networks Model." Journal of Intelligence and Information Systems, Vol.21, No.3, (2015), 79-99. https://doi.org/10.13088/jiis.2015.21.3.79
  10. Jo, N. O. and K. S. Shin. "Bankrupcy Prediction Modeling Using Qualitative Information Based on Big Data Analytics", Journal of Intelligence and Information Systems, Vol.22, No.2, (2016), 33-56. https://doi.org/10.13088/jiis.2016.22.2.033
  11. Kapinos, P., and O.A. Mitnik, "A Top-Down Approach to Stress-Testing Banks", Journal of Financial Services Research, Vol.49, No.2, (2016), 229-264. https://doi.org/10.1007/s10693-015-0228-8
  12. Kim, G. P., H. K. Lee, J. H. Kim and H. J. Kwon, "The Fourth Industrial Revolution in Major Countries and Growth Strategy of Korea: U.S., Germany and Japan Cases", Korea Institute for International Economic Policy, Policy Analysis, (2017).
  13. Kim, J. B. and J. S. Lee, "Usability of Cash Flow Data in Predicting Bankruptcy Using Artificial Intelligence Techniques: The Case of Small and Medium Sized Firms", Korean Journal of Business Administration, No.26, (2000), 229-250.
  14. Kim, M. J., "Ensemble Learning for Solving Data Imbalance in Bankruptcy Prediction", Journal of Intelligence and Information Systems, Vol.15, No.3. (2009), 1-15.
  15. Kim, M. J., H. B. Kim and D. K. Kang, "Optimizing SVM Ensembles Using Genetic Algorithms in Bankruptcy Prediction", Journal of information and communication convergence engineering, Vol.8, No.4, (2010), 370-376. https://doi.org/10.6109/jicce.2010.8.4.370
  16. Kim, M. J., "Ensemble Learning with Support Vector Machines for Bond Rating", Journal of Intelligence and Information Systems, Vol.18, No.2, (2012), 29-45. https://doi.org/10.13088/JIIS.2012.18.2.029
  17. Kim, S. B., P. Ji and K. J. Jo, "The Analysis on the Causes of Corporate Bankruptcy with the Bankruptcy Prediction Model", Journal of Market Economy, Vol.40, No.1, (2011), 85-106.
  18. Kim, S. J. and H. C. Ahn, "Estimation Model applied Random Forest for Corporate Bond Ratings", Journal of Intelligence and Information Systems, Spring Conference, (2014), 371-376.
  19. Kim, Y. D., C. H. Jun and H. S. Lee, "A new classification method using penalized partial Least squares", Journal of the Korean Data and Information Science Society, Vol.22, No.5, (2011), 931-940.
  20. Kim, Y. T. and M. H. Kim, "An Artificial Neural Network Model for Business Failure Prediction", Korean Journal of Accounting Research, Vol.6, No.1, (2001), 275-294.
  21. Kwon, H. K., D. K. Lee and M. S. Shin, "Dynamic forecasts of bankruptcy with Recurrent Neural Network model", Journal of Intelligence and Information Systems, Vol.23, No.3, (2017), 139-153. https://doi.org/10.13088/JIIS.2017.23.3.139
  22. Lee, I. R. and D. C. Kim, "Evaluation of Bankruptcy Prediction Model Using Accounting Information and Market Information", Journal of Korean Finance Association, Vol.28, No.4(2015), 626-666.
  23. Lee, J. S. and J. H. Han, "Test of Non-Financial Information in Bankruptcy Prediction using Artificial Neural Network - The Case of Small and Medium - Sized Firms - )", Journal of Intelligence and Information Systems, Vol.1, No.1, (1995), 123-134.
  24. Lee, K. C., "Comparative Study on the Bankruptcy Prediction Power of Statistical Model and Al Models : MDA , Inductive Learning , Neural Network )", Journal of the Korean Operations Research and Management Science Society, Vol.18, No.2, (1993), 57-81.
  25. Min, S. H., "Bankruptcy prediction using an improved bagging ensemble", Journal of Intelligence and Information Systems, Vol.20, No.4, (2014), 121-139. https://doi.org/10.13088/JIIS.2014.20.4.121
  26. Min, S. H., "Simultaneous optimization of KNN ensemble model for bankruptcy prediction", Journal of Intelligence and Information Systems, Vol.22, No.1, (2016), 139-157. https://doi.org/10.13088/JIIS.2016.22.1.139
  27. No, G. M. and W. G. Han, "ICT Policy Direction After 100-days Moon Jae-in government launched.", National Information Society Agency, Hot Issue Report, (2017).
  28. Ohlson, J. A., "Financial Ratios and the Probabilistic Prediction of Bankruptcy", Journal of Accounting Research, (1980), 109-131.
  29. Park, J. Y., Y. W. Kim and M. Y. Lee, "A Prediction Model of Small Business Bankruptcy", Journal of Korean Logos Management, Conference, (2007), 202-204.
  30. Presidential Committee on the Fourth Industrial Revolution, "Data Industry Promotion Strategy - I-KOREA 4.0 Data Field Plan, I-DATA+", (2017).
  31. Shapiro, S. S. and M. B. Wilk, "An analysis of variance test for normality (complete samples)", Biometrika, Vol.52, (1965), 591-611. https://doi.org/10.1093/biomet/52.3-4.591
  32. Swedberg, R., "The Structure of Confidence and the Collapse of Lehman Brothers", Research in the Sociology of Organizations, (2009).
  33. Tibshirani, R., "Regression Shrinkage and Selection via the Lasso", Journal of the Royal Statistical Society, Series B (Methodological), Vol.58, No.1, (1996), 267-288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
  34. Wang, H., Q. Xu and L. Zhou, "Large Unbalanced Credit Scoring Using Lasso-Logistic Regression Ensemble", PLoS One, San Francisco, Vol.10, No.2, (2015).
  35. Welch, B. L., "'Student' and Small Sample Theory", Journal of the American Statistical Association, Vol.53, No.284, (1958), 777-788.
  36. Yeh, S., C. Wang and M. Tsai, "Corporate default prediction via deep learning", Wireless and Optical Communication Conference, Vol.24, 1-8.
  37. Zmijewski, M. E., "Methodological issues related to the estimation of financial distress prediction models", Studies on Current Econometric Issues in Accounting Research, Vol.22, (1984), 59-82.

Cited by

  1. 기계학습을 이용한 수출신용보증 사고예측 vol.27, pp.1, 2021, https://doi.org/10.13088/jiis.2021.27.1.083