Bankruptcy Prediction Modeling Using Qualitative Information Based on Big Data Analytics

빅데이터 기반의 정성 정보를 활용한 부도 예측 모형 구축

  • Received : 2016.05.03
  • Accepted : 2016.06.13
  • Published : 2016.06.30


Many researchers have focused on developing bankruptcy prediction models using modeling techniques, such as statistical methods including multiple discriminant analysis (MDA) and logit analysis or artificial intelligence techniques containing artificial neural networks (ANN), decision trees, and support vector machines (SVM), to secure enhanced performance. Most of the bankruptcy prediction models in academic studies have used financial ratios as main input variables. The bankruptcy of firms is associated with firm's financial states and the external economic situation. However, the inclusion of qualitative information, such as the economic atmosphere, has not been actively discussed despite the fact that exploiting only financial ratios has some drawbacks. Accounting information, such as financial ratios, is based on past data, and it is usually determined one year before bankruptcy. Thus, a time lag exists between the point of closing financial statements and the point of credit evaluation. In addition, financial ratios do not contain environmental factors, such as external economic situations. Therefore, using only financial ratios may be insufficient in constructing a bankruptcy prediction model, because they essentially reflect past corporate internal accounting information while neglecting recent information. Thus, qualitative information must be added to the conventional bankruptcy prediction model to supplement accounting information. Due to the lack of an analytic mechanism for obtaining and processing qualitative information from various information sources, previous studies have only used qualitative information. However, recently, big data analytics, such as text mining techniques, have been drawing much attention in academia and industry, with an increasing amount of unstructured text data available on the web. A few previous studies have sought to adopt big data analytics in business prediction modeling. Nevertheless, the use of qualitative information on the web for business prediction modeling is still deemed to be in the primary stage, restricted to limited applications, such as stock prediction and movie revenue prediction applications. Thus, it is necessary to apply big data analytics techniques, such as text mining, to various business prediction problems, including credit risk evaluation. Analytic methods are required for processing qualitative information represented in unstructured text form due to the complexity of managing and processing unstructured text data. This study proposes a bankruptcy prediction model for Korean small- and medium-sized construction firms using both quantitative information, such as financial ratios, and qualitative information acquired from economic news articles. The performance of the proposed method depends on how well information types are transformed from qualitative into quantitative information that is suitable for incorporating into the bankruptcy prediction model. We employ big data analytics techniques, especially text mining, as a mechanism for processing qualitative information. The sentiment index is provided at the industry level by extracting from a large amount of text data to quantify the external economic atmosphere represented in the media. The proposed method involves keyword-based sentiment analysis using a domain-specific sentiment lexicon to extract sentiment from economic news articles. The generated sentiment lexicon is designed to represent sentiment for the construction business by considering the relationship between the occurring term and the actual situation with respect to the economic condition of the industry rather than the inherent semantics of the term. The experimental results proved that incorporating qualitative information based on big data analytics into the traditional bankruptcy prediction model based on accounting information is effective for enhancing the predictive performance. The sentiment variable extracted from economic news articles had an impact on corporate bankruptcy. In particular, a negative sentiment variable improved the accuracy of corporate bankruptcy prediction because the corporate bankruptcy of construction firms is sensitive to poor economic conditions. The bankruptcy prediction model using qualitative information based on big data analytics contributes to the field, in that it reflects not only relatively recent information but also environmental factors, such as external economic conditions.


Bankruptcy Prediction;Big Data Analytics;Text Mining;Sentiment Analysis;Artificial Neural Networks


  1. Altman, E. I., "Financial ratios, discriminant analysis and the prediction of corporate bankruptcy," The journal of finance, Vol.23, No.4(1968), 589-609.
  2. Altman, E. I., Sabato, G., and N. Wilson, "The value of non-financial information in small and medium-sized enterprise risk management," Journal of Credit Risk, Vol.2, No.6(2010), 95-127.
  3. Asur, S. and B. A. Huberman, "Predicting the future with social media," Proceedings of 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), Vol.1, (2010), 492-499.
  4. Boiy, E. and M. F. Moens, "A machine learning approach to sentiment analysis in multilingual Web texts," Information retrieval, Vol.12, No.5(2009), 526-558.
  5. Church, K. W. and P. Hanks, "Word association norms, mutual information, and lexicography," Computational linguistics, Vol.16, No.1(1990), 22-29.
  6. Coussement, K. and D. Van den Poel, "Improving customer attrition prediction by integrating emotions from client/company interaction emails and evaluating multiple classifiers," Expert Systems with Applications, Vol.36, No.3(2009), 6127-6134.
  7. Ding, X., Liu, B., and P. S. Yu, A holistic lexicon-based approach to opinion mining. Proceedings of the 2008 International Conference on Web Search and Data Mining, ACM, 2008, 231-240.
  8. Du, W., Tan, S., Cheng, X., and X. Yun, "Adapting information bottleneck method for automatic construction of domain-oriented sentiment lexicon," Proceedings of the third ACM international conference on Web search and data mining, ACM, 2010, 111-120.
  9. Esuli, A. and F. Sebastiani, "Determining the semantic orientation of terms through gloss classification," Proceedings of the 14th ACM international conference on Information and knowledge management, 2005, 617-624.
  10. Esuli, A. and F. Sebastiani, "Sentiwordnet: A publicly available lexical resource for opinion mining," Proceedings of LREC, Vol.6(2006), 417-422.
  11. Feldman, R. and I. Dagan, "Knowledge Discovery in Textual Databases (KDT)," KDD, Vol.95 (1995), 112-117.
  12. Feldman, R. and J. Sanger, The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data, Cambridge University Press, 2007.
  13. Fletcher, D. and E. Goss, "Forecasting with neural networks: an application using bankruptcy data," Information & Management, Vol.24, No.3(1993), 159-167.
  14. Grunert, J., Norden, L., and M. Weber, "The role of non-financial factors in internal credit ratings," Journal of Banking & Finance, Vol.29, No.2(2005), 509-531.
  15. Hamer, M. M., "Failure prediction: sensitivity of classification accuracy to alternative statistical methods and variable sets," Journal of Accounting and Public Policy, Vol, 2, No.4 (1984), 289-307.
  16. Jeong, J. S., D. S. Kim, and J. W. Kim, "Influence analysis of Internet buzz to corporate performance : Individual stock price prediction using sentiment analysis of online news," Journal of Intelligence and Information Systems, Vol.21, No.4(2015), 37-51.
  17. Kim, S. M. and E. Hovy, "Determining the sentiment of opinions," Proceedings of the 20th international conference on Computational Linguistics, Association for Computational Linguistics, (2004), 1367-1373.
  18. Kim, S. M. and E. Hovy, "Extracting opinions, opinion holders, and topics expressed in online news media text," Proceedings of the Workshop on Sentiment and Subjectivity in Text, Association for Computational Linguistics, 2006, 1-8.
  19. Kim, S. and N. Kim, "A Study on the Effect of Using Sentiment Lexicon in Opinion Classification," Journal of Intelligence and Information Systems, Vol.20, No.1(2014), 133-148.
  20. Lee, J. S. and J. H. Han, "Usability Test of Non-Financial Information in Bankruptcy Prediction using Artificial Neural Network-The Case of Small and Medium-Sized Firms," Journal of Intelligence and Information Systems, Vol.1, No.1(1995), 123-134.
  21. Leshno, M. and Y. Spector, "Neural network prediction analysis: The bankruptcy case," Neurocomputing, Vol.10, No.2(1996), 125-147.
  22. Matsumoto, S., Takamura, H., and M. Okumura, "Sentiment classification using word sub-sequences and dependency sub-trees," Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining, Springer-Verlag, 2005, 301-311.
  23. Melville, P., Gryc, W., and R. D. Lawrence, "Sentiment analysis of blogs by combining lexical knowledge with text classification," Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, 2009, 1275-1284.
  24. O'Connor, B., Balasubramanyan, R., Routledge, B. R., and N. A. Smith, "From tweets to polls: Linking text sentiment to public opinion time series," Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media, Vol.11(2010), 122-129.
  25. Odom, M. D. and R. Sharda, "A neural network model for bankruptcy prediction," Proceedings of IJCNN International Joint Conference on Neural Networks, IEEE, 1990, 163-168.
  26. Ohlson, J. A., "Financial ratios and the probabilistic prediction of bankruptcy," Journal of accounting research, Vol.18, No.1(1980), 109-131.
  27. Pang, B. and L. Lee, "Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales," Proceedings of the Association for Computational Linguistics (ACL), 2005, 115-124.
  28. Pang, B., Lee, L., and S. Vaithyanathan, "Thumbs up?: sentiment classification using machine learning techniques," Proceedings of the ACL-02 conference on Empirical methods in natural language processing, Vol.10(2002), 79-86.
  29. Pervan, I. and T. Kuvek, "The Relative Importance of Financial Ratios and Nonfinancial Variables in Predicting of Insolvency," Croatian Operational Research Review, Vol.4, No.1(2013), 187-197.
  30. Salah, Z., Coenen, F., and D. Grossi, "Generating Domain-Specific Sentiment Lexicons for Opinion Mining," Advanced Data Mining and Applications, Springer Berlin Heidelberg, 2013, 13-24.
  31. Salton, G. and C. Buckley, "Term-weighting approaches in automatic text retrieval," Information Processing and Management, Vol.23, No.5(1988), 513-523.
  32. Schumaker, R. P., Zhang, Y., Huang, C. N., and H. Chen, "Evaluating sentiment in financial news articles," Decision Support Systems, Vol.53, No.3(2012), 458-464.
  33. Shaw, M. J. and J. A. Gentry, "Inductive learning for risk classification," IEEE Expert, Vol.5, No.1(1990), 47-53.
  34. Shin, K.-s., Lee, T. S., and H,-j. Kim, "An application of support vector machines in bankruptcy prediction model," Expert Systems with Applications, Vol.28, No.1(2005), 127-135.
  35. Sidorov, G. et al., "Empirical study of machine learning based approach for opinion mining in tweets," Proceedings of the 11th Mexican international conference on Advances in Artificial Intelligence, Vol. Part I, 2012, 1-14.
  36. Song, J. and S. Lee, "Automatic Construction of Positive/Negative Feature-Predicate Dictionary for Polarity Classification of Product Reviews," Journal of KIISE: Software and Applications, Vol.38, No.3(2011), 157-168.
  37. Sparck Jones, K., "A statistical interpretation of term specificity and its application in retrieval," Journal of documentation, Vol.28, No.1(1972), 11-21.
  38. Tam, K. Y. and M. Y. Kiang, "Managerial applications of neural networks: the case of bank failure predictions," Management science, Vol.38, No.7(1992), 926-947.
  39. Tetlock, P. C., "Saar-Tsechansky, M., and S. Macskassy, "More than words: Quantifying language to measure firms' fundamentals," The journal of finance, Vol.63, No.3(2008), 1437-1467.
  40. Turney, P. D., "Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews," Proceedings of the 40th annual meeting on association for computational linguistics, 2002, 417-424.
  41. Turney, P. D. and M. L. Littman, "Measuring praise and criticism: Inference of semantic orientation from association," ACM Transactions on Information Systems (TOIS), Vol.21, No.4(2003), 315-346.
  42. Wiebe, J., Wilson, T., Bruce, R., Bell, M., and M. Martin, "Learning subjective language," Computational linguistics, Vol.30, No.3(2004), 277-308.
  43. Wilson, T., Janyce W., and R. Hwa, "Just how mad are you? Finding strong and weak opinion clauses," Proceedings of National Conference on Artificial Intelligence (AAAI-2004), 2004, 761-767.
  44. Ye, Q., Zhang, Z., and R. Law, "Sentiment classification of online reviews to travel destinations by supervised machine learning approaches," Expert Systems with Applications, Vol.36, No.3(2009), 6527-6535.
  45. Yu, E., Kim, Y., Kim, N., and S. R. Jung, "Predicting the Direction of the Stock Index by Using a Domain-Specific Sentiment Dictionary," Journal of Intelligence and Information Systems, Vol.19, No.10(2013), 95-110.
  46. Yu, H. and V. Hatzivassiloglou, "Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences," Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, 2003, 129-136.
  47. Zhang, L. and B. Liu, "Identifying noun product features that imply opinions," Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Vol.2, (2011), 575-580.


Supported by : 한국연구재단