DOI QR코드

DOI QR Code

Credit Card Bad Debt Prediction Model based on Support Vector Machine

신용카드 대손회원 예측을 위한 SVM 모형

  • Received : 2012.10.22
  • Accepted : 2012.12.06
  • Published : 2012.12.31

Abstract

In this paper, credit card delinquency means the possibility of occurring bad debt within the certain near future from the normal accounts that have no debt and the problem is to predict, on the monthly basis, the occurrence of delinquency 3 months in advance. This prediction is typical binary classification problem but suffers from the issue of data imbalance that means the instances of target class is very few. For the effective prediction of bad debt occurrence, Support Vector Machine (SVM) with kernel trick is adopted using credit card usage and payment patterns as its inputs. SVM is widely accepted in the data mining society because of its prediction accuracy and no fear of overfitting. However, it is known that SVM has the limitation in its ability to processing the large-scale data. To resolve the difficulties in applying SVM to bad debt occurrence prediction, two stage clustering is suggested as an effective data reduction method and ensembles of SVM models are also adopted to mitigate the difficulty due to data imbalance intrinsic to the target problem of this paper. In the experiments with the real world data from one of the major domestic credit card companies, the suggested approach reveals the superior prediction accuracy to the traditional data mining approaches that use neural networks, decision trees or logistics regressions. SVM ensemble model learned from T2 training set shows the best prediction results among the alternatives considered and it is noteworthy that the performance of neural networks with T2 is better than that of SVM with T1. These results prove that the suggested approach is very effective for both SVM training and the classification problem of data imbalance.

Keywords

References

  1. 금융감독원 보도자료, 신용카드사 경영실적, 2002-2006.
  2. 강필성, 이형주, 조성준, "데이터 불균형 문제에서의 SVM 앙상블 기법의 적용", 한국정보과학회 추계학술대회논문집, 제31권, 제2호 (2005), pp.706-708.
  3. 김화경, 한상범, 지원철, "축소된 앙상블을 이용한 현금융통 적발 모형", 지능정보연구, 제16권(2010), pp.93-116.
  4. 노태협, 유명환, 한인구, "러프집합 이론과 사례기반추론을 결합한 기업신용평가 모형", 정보시스템연구, 제14권(2005), pp.41-65.
  5. 이영섭, 오현정, 김미경, "데이터마이닝에서 배깅, 부스팅, SVM 분류 알고리즘 비교 분석", 응용통계연구, 제18권(2005), pp.343-354.
  6. 이영찬, "인공신경망과 Support Vector Machine의 기업부도예측 성과 비교:Support Vector Machine의 유용성을 중심으로", 한국지능정보시스템학회 2004년 춘계학술대회 논문집, 2004.
  7. 정석훈, 서영무, "Rough Set 기법을 이용한 신용카드 연체자 분류", Entrue Journal of Information Technology, 제7권(2008), pp.141- 150.
  8. 하성호, 양정원, 민지홍, "코호넨 네트워크와 생존분석을 활용한 신용예측", 한국경영과학회지, 제34권(2009), pp.35-54.
  9. Allen, L. N. and L. C. Rose, "Financial survival Analysis of default debtors", Journal of the Operational Research society, Vol.57 (2006), pp.630-636. https://doi.org/10.1057/palgrave.jors.2602038
  10. Awad, M., L. Khan, F. Bastani, and I. L. Yen, "An effective support vector machine SVMs performance using hierarchical clustering", Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence(ICTAI), 2004.
  11. Breiman, L., "Bagging Predictors", Machine Learning, Vol.24(1996), pp.123-140.
  12. Breiman, L., "Arcing Classifiers", Annals of Statistics, Vol.26(1998), pp.801-849. https://doi.org/10.1214/aos/1024691079
  13. Chawla, N. V., N. Japkowicz, and A. Kolcz, "Editorial:Special Issue on Learning from Imbalanced Data Sets", SIGKDD Exploration, Vol.6(2004), pp.1-6. https://doi.org/10.1145/1007730.1007733
  14. Cervantes, J., X Li, and W Yu, "Support vector machine classification for large data sets via minimum enclosing ball clustering", Neurocomputing, 2008.
  15. Chen, M. C. and Huang, S. H., "Credit Scoring and Rejected Instances Reassigning through Evolutionary Computation Techniques", Expert Systems with Application, Vol.24(2003), pp.433-441. https://doi.org/10.1016/S0957-4174(02)00191-4
  16. Collobert, R. and S. Bengio, "SVMTorch: Support vector machines for large regression problems", 2001.
  17. Cristianini, N. and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-based Learning Methods, Cambridge University Press, 2000.
  18. Dietterich, T., "An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees:Bagging, Boosting and Randomization", Machine Learning, Vol.40, No.2(2000), pp.139-157. https://doi.org/10.1023/A:1007607513941
  19. Fan, R. E. and P. H. Chen, "Working set selection using second order information for training SVM", Journal of Machine Learning Research, 2005.
  20. Freund, Y. and R. Shapiro, "A Decisiontheoretic Generalization of On-line Learning and an Application to Boosting", Journal of Computer and System Sciences, Vol.55 (1997), pp.119-139.
  21. Gustavo, E. A., P. A. Batista, and R. C. Prati, "A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data", SIGKDD Explorations, 2004.
  22. Hansen, L. and P. Salomon, "Neural Network Ensembles", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.12(1990), pp.993-1001. https://doi.org/10.1109/34.58871
  23. Huang, Z., H. Chen, C. J. Hsu, and W. H. Chen. "Credit rating analysis with support vector machines and neural networks:a market comparative study", Decision support systems, 2004.
  24. Huang, C. L. and M. C. Chen. "Credit scoring with a data mining approach based on support vector machines", Expert Systems with Applications, 2007.
  25. Hsigh, N. C., "Hybrid mining approach in the design of credit scoring models", Expert Systems wih Application, Vol.28(2005), pp.655-665. https://doi.org/10.1016/j.eswa.2004.12.022
  26. Japkowicz N. and S. Stephen, "The Class Imbalance Problem:A Systematic Study", Intelligent Data Analysis, Vol.6, No.5(2002), pp.429-450.
  27. Min, J. H., C. W. Jeong, and M. S. Kim, "Tuning the Architecture of Support Vector Machine:The Case of Bankruptcy Prediction", Int'l Journal of Management Science, Vol.17, No.1(2011), pp.19-43.
  28. Min, J. H,. "Bankruptcy prediction using support vector machine with optimal choice of kernel function parameters", Expert Systems with Applications, 2005.
  29. Opitz, D., "Feature Selection for Ensembles", Proc. of the 16th National Conf. on Artificial Intelligence, AAAI, (1999), pp.379- 384.
  30. Platt, J., Advances in Kernel Methods: Support Vector Machine:Fast training of support vector machine using sequential minimal optimization, MIT Press, 1998.
  31. Rooney, N., D. Patterson and C. Nugent, "Pruning Extension to Stacking", Intelligent Data Analysis, Vol.10(2006), pp.47-66.
  32. Scholkopf, B., K. K. Sung and C. J. C. Burges, "Comparing support vector machines with Gaussian kernels to radial basis function classifiers", Signal Processing, 2002.
  33. Siddiqi, N., Credit Risk Scorecards. John Wiley and Sons, 2006.
  34. Tang, Y., Y. Q. Zhang, and N. V. Chawla, "SVMs Modeling for Highly Imbalanced Classification", IEEE Transactions on Systems, Man, and Cybernetics, 2009.
  35. Thomas, L. C., "A Survey of Credit and Behavioral Scoring:Forecasting Financial Risk of Lending to Consumers", International Journal of Forecasting, Vol.16(2000), pp.149-172. https://doi.org/10.1016/S0169-2070(00)00034-0
  36. Japkowicz, N. and S. Stephen, "The Class Imbalance Problem:A Systematic Vapnik, V.", The Nature of Statistical Learning Theory, Springer-Verlag, 1995.
  37. Wen, T. and A. Edelman, "A fast projected conjugate gradient algorithm for training support vector machines", 2003.
  38. West, D., "Neural network credit scoring models", Computers and Operations Research, 2000.
  39. Wolpert, D., "Stacked Generalization", Neural Networks, Vol.5(1992), pp.241-259. https://doi.org/10.1016/S0893-6080(05)80023-1
  40. Wu, G. and E. Y. Chang, "Class-Boundary Alignment for Imbalanced Dataset Learning", ICML, 2003.
  41. Yang, C. Y., J. S. Yang and J. J. Wang, "Margin calibration in SVM class-imbalanced learning", Neurocomputing, 2009.
  42. Yu, H., J. Yang and J. Han, "Classifying large data sets using SVMs with hierarchical clusters", Proceedings of the 9th ACM SIGKDD, 2003.

Cited by

  1. Do trade area grades really affect credit ratings of small businesses? An application of big data vol.55, pp.9, 2017, https://doi.org/10.1108/MD-11-2016-0834