DOI QR코드

DOI QR Code

Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis

부도예측을 위한 KNN 앙상블 모형의 동시 최적화

Min, Sung-Hwan
민성환

  • Received : 2016.03.02
  • Accepted : 2016.03.14
  • Published : 2016.03.31

Abstract

Bankruptcy involves considerable costs, so it can have significant effects on a country's economy. Thus, bankruptcy prediction is an important issue. Over the past several decades, many researchers have addressed topics associated with bankruptcy prediction. Early research on bankruptcy prediction employed conventional statistical methods such as univariate analysis, discriminant analysis, multiple regression, and logistic regression. Later on, many studies began utilizing artificial intelligence techniques such as inductive learning, neural networks, and case-based reasoning. Currently, ensemble models are being utilized to enhance the accuracy of bankruptcy prediction. Ensemble classification involves combining multiple classifiers to obtain more accurate predictions than those obtained using individual models. Ensemble learning techniques are known to be very useful for improving the generalization ability of the classifier. Base classifiers in the ensemble must be as accurate and diverse as possible in order to enhance the generalization ability of an ensemble model. Commonly used methods for constructing ensemble classifiers include bagging, boosting, and random subspace. The random subspace method selects a random feature subset for each classifier from the original feature space to diversify the base classifiers of an ensemble. Each ensemble member is trained by a randomly chosen feature subspace from the original feature set, and predictions from each ensemble member are combined by an aggregation method. The k-nearest neighbors (KNN) classifier is robust with respect to variations in the dataset but is very sensitive to changes in the feature space. For this reason, KNN is a good classifier for the random subspace method. The KNN random subspace ensemble model has been shown to be very effective for improving an individual KNN model. The k parameter of KNN base classifiers and selected feature subsets for base classifiers play an important role in determining the performance of the KNN ensemble model. However, few studies have focused on optimizing the k parameter and feature subsets of base classifiers in the ensemble. This study proposed a new ensemble method that improves upon the performance KNN ensemble model by optimizing both k parameters and feature subsets of base classifiers. A genetic algorithm was used to optimize the KNN ensemble model and improve the prediction accuracy of the ensemble model. The proposed model was applied to a bankruptcy prediction problem by using a real dataset from Korean companies. The research data included 1800 externally non-audited firms that filed for bankruptcy (900 cases) or non-bankruptcy (900 cases). Initially, the dataset consisted of 134 financial ratios. Prior to the experiments, 75 financial ratios were selected based on an independent sample t-test of each financial ratio as an input variable and bankruptcy or non-bankruptcy as an output variable. Of these, 24 financial ratios were selected by using a logistic regression backward feature selection method. The complete dataset was separated into two parts: training and validation. The training dataset was further divided into two portions: one for the training model and the other to avoid overfitting. The prediction accuracy against this dataset was used to determine the fitness value in order to avoid overfitting. The validation dataset was used to evaluate the effectiveness of the final model. A 10-fold cross-validation was implemented to compare the performances of the proposed model and other models. To evaluate the effectiveness of the proposed model, the classification accuracy of the proposed model was compared with that of other models. The Q-statistic values and average classification accuracies of base classifiers were investigated. The experimental results showed that the proposed model outperformed other models, such as the single model and random subspace ensemble model.

Keywords

Random Subspace;Bankruptcy Prediction;Ensemble;Genetic Algorithms

References

  1. Abellan, J. and C. J. Mantas, "Improving Experimental Studies about Ensembles of Classifiers for Bankruptcy Prediction and Credit Scoring," Expert Systems with Applications, Vol.41, No.8(2014), 3825-3830. https://doi.org/10.1016/j.eswa.2013.12.003
  2. Alexandre, L., A. Campihlo, and M. Kamel, "On combining classifiers using sum and product rules," Pattern Recognition Letter, Vol.22, No.12(2001), 1283-1289. https://doi.org/10.1016/S0167-8655(01)00073-3
  3. Altman, E. I., "Financial ratios, discriminant analysis and the prediction of corporate bankruptcy," The Journal of Finance, Vol.23, No.4(1968), 589-609. https://doi.org/10.1111/j.1540-6261.1968.tb00843.x
  4. Beaver, W. H., "Financial ratios as predictors of failure," Journal of Accounting Research, Vol.4(1966), 71-111. https://doi.org/10.2307/2490171
  5. Bian, S. and W. Wang, "On diversity and accuracy of homogeneous and heterogeneous ensembles," International Journal of Hybrid Intelligent Systems, Vol.4, No.2(2007), 103-128. https://doi.org/10.3233/HIS-2007-4204
  6. Breiman, L., "Bagging predictors," Machine Learning, Vol.24, No.2(1996), 123-140.
  7. Buta, P., "Mining for financial knowledge with CBR," AI Expert, Vol.9, No.10(1994), 34-41.
  8. Dietterich, T. G., "Machine-learning research: Four current directions," AI Magazine, Vol.18, No.4(1997), 97-136.
  9. Goldberg, D. E., Genetic algorithms in search, optimization and machine learning, Addison-Wesley, New York, 1989.
  10. Ho, T., "The random subspace method for construction decision forests," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.20, No.8(1998), 832-844. https://doi.org/10.1109/34.709601
  11. Ho, T., "Multiple classifier combination: Lessons and next steps," Series in Machine Perception and Artificial Intelligence, Vol.47(2002), 171-198.
  12. Hung, C. and J-H. Chen, "A Selective Ensemble Based on Expected Probabilities for Bankruptcy Prediction," Expert Systems with Applications, Vol.36, No.3(2009), 5297-5303. https://doi.org/10.1016/j.eswa.2008.06.068
  13. Kuncheva, L., J. Bezdek, and R. Duin, "Decision templates for multiple classifier fusion: an experimental comparison," Pattern Recognition, Vol.34, No.2(2001), 299-314. https://doi.org/10.1016/S0031-3203(99)00223-X
  14. Kuncheva, L. I. and C. J. Whitaker, "Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy," Machine Learning, Vol.51, No.2(2003), 181-207. https://doi.org/10.1023/A:1022859003006
  15. Kim, M., "Ensemble Learning for Solving Data Imbalance in Bankruptcy Prediction," Journal of Intelligence and Information Systems, Vol.15, No.3(2009), 1-15.
  16. Kim, M., "Optimal Selection of Classifier Ensemble Using Genetic Algorithms," Journal of Intelligence and Information Systems, Vol.16, No.4(2010), 99-112.
  17. Kim, S. H. and J. W. Kim, "SOHO Bankruptcy Prediction Using Modified Bagging Predictors," Journal of Intelligence and Information Systems, Vol.13, No.2(2007), 15-26.
  18. Li, H., Y.-C. Lee, Y.-C. Zhou, and J. Sun, "The random subspace binary logit (RSBL) model for bankruptcy prediction," Knowledge-Based Systems, Vol.24, No.8(2011), 1380-1388. https://doi.org/10.1016/j.knosys.2011.06.015
  19. Li, K., Z. Liu, and Y. Han, "Study of Selective Ensemble Learning Methods Based on Support Vector Machine," Physics Procedia, Vol.33(2012), 1518-1525. https://doi.org/10.1016/j.phpro.2012.05.247
  20. Louzada, F., O. Anacleto-Junior, C. Candolo, and J. Mazucheli, "Poly-bagging predictors for classification modelling for credit scoring," Expert Systems with Applications, Vol.38, No.10(2011), 2717-12720. https://doi.org/10.1016/j.eswa.2010.08.061
  21. Mandler, E. and J. Schurmann, "Combining the classification results of independent classifiers based on the Dempster-Shafer theory of evidence," In E.S. Geselma and L.N. Kanal (eds.), Pattern Recognition and Artificial Intelligence, North Holland, Amsterdam, (1988), 381-393.
  22. Marques, A. I., V. Garcia, and J. S. Sanchez, "Two-Level Classifier Ensembles for Credit Risk Assessment," Expert Systems with Applications, Vol.39, No.12(2012), 10916-10922. https://doi.org/10.1016/j.eswa.2012.03.033
  23. Meyer, P. A. and H. W. Pifer, "Prediction of bank failures," The Journal of Finance, Vol.25, No.4(1970), 853-868. https://doi.org/10.1111/j.1540-6261.1970.tb00558.x
  24. Messier, W. F. Jr. and J. V. Hansen, "Inducing rules for expert system development: an example using default and bankruptcy data," Management Science, Vol.34, No.12(1998), 1403-1415.
  25. Min, S., "Developing an Ensemble Classifier for Bankruptcy Prediction," Journal of the Korea Society Industrial Information System, Vol.17, No.7(2012), 139-148. https://doi.org/10.9723/jksiis.2012.17.7.139
  26. Min, S., "Bankruptcy Prediction Using an Improved Bagging Ensemble," Journal of Intelligence and Information Systems, Vol.20, No.4(2014), 121-139.
  27. Nanni, L. and A. Lumini, "An Experimental Comparison of Ensemble of Classifiers for Bankruptcy Prediction and Credit Scoring," Expert Systems with Applications, Vol.36, No.2(2009), 3028-3033. https://doi.org/10.1016/j.eswa.2008.01.018
  28. Ohlson, J. A., "Financial ratios and the probabilistic prediction of bankruptcy," Journal of Accounting Research, Vol.18, No.1(1980), 109-131. https://doi.org/10.2307/2490395
  29. Tam, K. Y. and M. Y. Kiang, "Managerial applications of neural networks: the case of bank failure predictions," Management Science, Vol.38, No.7(1992), 926-947. https://doi.org/10.1287/mnsc.38.7.926
  30. Tsai, C. and J. Wu, "Using Neural Network Ensembles for Bankruptcy Prediction and Credit Scoring," Expert Systems with Applications, Vol.34, No.4(2008), 2639-2649. https://doi.org/10.1016/j.eswa.2007.05.019
  31. Wang, G. and J. Ma, "A hybrid ensemble approach for enterprise credit risk assessment based on Support Vector Machine," Expert Systems with Applications, Vol.39, No.5(2009), 5325-5331.
  32. Zhang, G., Y. M. Hu, E. B. Patuwo, and C. D. Indro, "Artificial neural networks in bankruptcy prediction: general framework and cross-validation analysis," European Journal of Operational Research, Vol.116, No.1(1999), 16-32. https://doi.org/10.1016/S0377-2217(98)00051-4