DOI QR코드

DOI QR Code

금융 이상 거래 탐지에서의 Semi-Hard Example Mining 기반 불균형 데이터 증강 기법

Imbalanced Data Augmentation Based on Semi-Hard Example Mining for Financial Fraud Detection Systems

  • 강경태 (동아대학교 경영정보학과 ) ;
  • 김성재 (동아대학교 글로벌금융연구소) ;
  • 조용복 (동아대학교 경영정보학과)
  • Kyungtae Kang (Department of Management Information Systems, Dong-A university) ;
  • Sungjae Kim (Institue of Global Finance, Dong-A University) ;
  • Yongbok Cho (Department of Management Information Systems, Dong-A university)
  • 투고 : 2025.06.16
  • 심사 : 2025.08.18
  • 발행 : 2025.08.31

초록

최근 급격히 증가하고 있는 금융 이상 거래는 막대한 경제적 손실을 일으키고 있다. 하지만 금융 이상 거래 탐지에 이용되는 데이터에서 이상 거래는 정상 거래에 비해 극히 적어 효과적인 탐지를 어렵게 하는 불균형 데이터 문제가 제기되어왔다. 본 연구는 이러한 불균형한 데이터 특성의 한계를 극복하기 위해 VAE-GAN(Variational Autoencoder-Generative Adversarial Network) 과 Semi-Hard Example Mining 기법을 결합하여, 이상 거래 데이터의 품질을 유지하면서 실제로 이상 거래이지만 정상 거래로 판단하는 거짓 음성(False Negative)을 줄이는 모델을 제안한다. 먼저, VAE-GAN을 통해 실제 거래와 유사한 소수 클래스 합성 데이터를 생성하고, Semi-Hard Example Mining으로 분류기가 헷갈리기 쉬운 사례를 집중적으로 재생성한다. 이를 신용카드 이상 거래 데이터셋에 적용한 결과, 기존 보간 기반 오버샘플링 기법(SMOTE, Borderline-SMOTE, ADASYN)과 기존 VAE-GAN 증강 대비 재현율(Recall), F2 스코어(F2 Score)가 향상됨을 확인하였다. 본 연구는 금융권 FDS(Fraud Detection System)에서 불균형 데이터 문제를 완화하고 탐지 성능을 극대화하는 데 기여할 것으로 기대한다.

The rapid rise in fraudulent financial transactions is inflicting substantial economic losses, yet effective detection remains difficult because genuine fraud represents only a tiny fraction of overall activity. To overcome this extreme class-imbalance problem, we propose a model that integrates a Variational Autoencoder Generative Adversarial Network (VAE-GAN) with Semi-Hard Example Mining (SHEM). The VAE-GAN synthesizes high-fidelity minority-class samples that closely mimic real transactions, while SHEM repeatedly targets borderline cases that the classifier is prone to misjudge, thereby reducing false negatives (fraudulent transactions incorrectly labeled as legitimate). Experiments on a benchmark credit-card-fraud dataset show that our method consistently outperforms interpolation-based oversampling techniques (SMOTE, Borderline-SMOTE, ADASYN) and a vanilla VAE-GAN baseline, achieving higher Precision, Recall, F1, and F2 scores. These results demonstrate the model's potential to alleviate class imbalance and maximize detection performance in financial-sector fraud-detection systems(FDS).

키워드

참고문헌

  1. 강재구, 이지연, 유연우, "빅데이터 기술을 활용한 이상금융거래 탐지시스템 구축 연구", 한국융합학회논문지, 제8권, 제4호, 2017, pp. 19-24. https://doi.org/10.15207/JKCS.2017.8.4.019
  2. 김도형, 이상근, 정순기, "이상금융거래 탐지시스템(FDS)을 위한 딥러닝 모델의 설계 및 구현", 융합보안논문지, 제21권, 제5호, 2021, pp. 69-78. https://doi.org/10.33778/kcsa.2021.21.5.069
  3. 김서이, 이연지, 이일구, "실시간 리샘플링 기법을 활용한 LSTM 기반의 사기 거래 탐지시스템", 한국정보처리학회 학술대회논문집, 제31권, 제1호, 2024, pp. 505-508.
  4. 김예원, 유예림, 최홍용, "생성적 적대 신경망과 딥러닝을 활용한 이상 거래탐지 시스템모형", 경영정보학연구, 제22권, 제1호, 2020, pp. 59-72. https://doi.org/10.14329/isr.2020.22.1.059
  5. 김은비, 정익래, "비대면 금융거래 사용자 확인 개선방안 연구: 메신저피싱 사례를 중심으로", 정보보호학회논문지, 제33권, 제2호, 2023, pp. 353-362. https://doi.org/10.13089/JKIISC.2023.33.2.353
  6. 김태이, 홍태호, "경량 사전학습 언어모델을 활용한 BERT-CNN 기반 리뷰 유용성 예측", 지능정보연구, 제31권, 제2호, 2025, pp. 291-305. https://doi.org/10.13088/jiis.2025.31.2.291
  7. 류성은, 구병국, 성기승, 강민정, 김민지, "Hard Example Mining과 Augmentation 최적화를 이용한 위험 상황 탐지 및 소통 서비스", Proceedings of KIIT Conference, 2021, pp. 734-738.
  8. 신성원, "보이스피싱의 실태 및 대응방안에 관한 연구", 한국치안행정논집, 제19권, 제4호, 2022, pp. 165-185. https://doi.org/10.25023/KAPSA.19.4.202211.165
  9. 한국은행, "(2023년도) 지급결제보고서", 한국은행, 2024. Available at https://dl.bok.or.kr/pyxis-api/1/digital-files/755e76f9-b086-42df-ade9-d f6|80d6aa195.
  10. 홍태호, 박지영, "사례기반추론을 이용한 다이렉트 마케팅의 고객반응예측모형의 통합", 정보시스템연구, 제18권, 제3호, 2009, pp. 375-399.
  11. Alharbi, S., A. Alorini, K. Alahmadi, H. Alhosaini, Y. Zhu, and X. Wang, "Exploring oversampling techniques for fraud detection with imbalanced classes", International Journal of Computer Vision and Signal Processing, Vol.14, No.1, 2024, pp. 26-33.
  12. Alkhawaldeh, I. M., I. Albalkhi, and A. J. Naswhan, "Challenges and limitations of synthetic minority oversampling techniques in machine learning", World Journal of Methodology, Vol.13, No.5, 2023, pp. 373-378. https://doi.org/10.5662/wjm.v13.i5.373
  13. Arjovsky M, Chintala S, Bottou L, "Wasserstein generative adversarial networks", Proc. 34th Int. Conf. Machine Learning, 2017, pp. 214-223.
  14. Ashfaq, T., R. Khalid, A. S. Yahaya, S. Aslam, A. T. Azar, S. Alsafari, and I. A. Hameed, "A machine-learning and blockchain-based efficient fraud detection mechanism", Sensors, Vol.22, No.19, 2022, Article 7162.
  15. Bahnsen, A. C., A. Stojanovic, D. Aouada, and B. Ottersten, "Cost-sensitive credit-card fraud detection using Bayes minimum risk", Proc. 2013 IEEE Int. Conf. Machine Learning & Applications, Vol.1, 2013, pp. 333-338.
  16. Basel Committee on Banking Supervision, "Digital fraud discussion paper", Technical Report BCBS-23-11, Bank for International Settlements, 2023.
  17. Becerra Suarez, F. L., H. Alvarez Vasquez, and M. G. Forero, "Improvement of bank fraud detection through synthetic data generation with Gaussian noise", Technologies, Vol.13, No.4, 2025, Article 141.
  18. Benchaji, I., S. Douzi, B. El Ouahidi, and J. Jaafari "Enhanced credit-card fraud detection based on attention mechanism and LSTM deep model", Journal of Big Data, Vol.8, 2021, Article 151.
  19. Burez, J. and D. Van den Poel, "Handling class imbalance in customer churn prediction", Expert Systems with Applications, Vol.36, No.3, 2009, pp. 4626-4637. https://doi.org/10.1016/j.eswa.2008.05.027
  20. Cao, Q. and S. Wang, "Applying over-sampling technique based on data density and cost-sensitive SVM to imbalanced learning", Proc. 2011 Int. Conf. Information Management, Innovation Management & Industrial Engineering, Vol.2, 2011, pp. 543-548.
  21. Chawla, N. V., K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: synthetic minority over-sampling technique", Journal of Artificial Intelligence Research, Vol.16, 2002, pp. 321-357. https://doi.org/10.1613/jair.953
  22. Chen, T. and C Guestrin, "XGBoost: a scalable tree boosting system", Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, 2016, pp. 785-794.
  23. Chung, J and K. Lee, "Credit-card fraud detection: an improved strategy for high recall using KNN, LDA, and linear regression", Sensors, Vol.23, No.18, 2023, Article 7788.
  24. Dalal, N. and B. Triggs, "Histograms of oriented gradients for human detection", Proc. CVPR 2005, Vol. 1, 2005, pp. 886-893.
  25. Dal Pozzolo, A., G. Boracchi, O. Caelen, C. Alippi, and G. Bontempi, "Credit-card fraud detection: a realistic modeling and a novel learning strategy", IEEE Transactions on Neural Networks and Learning Systems, Vol.29, No.8, 2017, pp. 3784-3797. https://doi.org/10.1109/TNNLS.2017.2736643
  26. Ding, Y., W. Kang, J. Feng, B. Peng, and A. Yang, "Credit-card fraud detection based on improved variational autoencoder generative adversarial network", IEEE Access, Vol.11, 2023, pp. 83680-83691.
  27. Du, H., L. Lv, H. Wang, and A. Guo, "A novel method for detecting credit-card fraud problems", PLOS ONE, Vol.19, No.3, 2024, e0294537.
  28. Engelmann, J. and S. Lessmann, "Conditional Wasserstein GAN-based oversampling of tabular data for imbalanced learning", Expert Systems with Applications, Vol.174, 2021, 114582.
  29. Federal Bureau of Investigation, "2023 Internet Crime Report", 2023. Available at: https://www.ic3.gov/.
  30. Fiore, U., A. De Santis, F. Perla, P. Zanetti, and F. Palmieri, "Using generative adversarial networks for improving classification effectiveness in credit-card fraud detection", Information Sciences, Vol. 479, 2019, pp. 448-455. https://doi.org/10.1016/j.ins.2017.12.030
  31. Goodfellow, I. J., J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, and Y. Bengio, "Generative adversarial nets", Advances in Neural Information Processing Systems, Vol.27, 2014, pp. 2672-2680.
  32. Gretton, A., K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola, "A kernel two-sample test", The Journal of Machine Learning Research, Vol.13, No.1, 2012, pp. 723-773.
  33. Habibpour, M., H. Gharoun, M. Mehdipour, A. Tajally, A. Asgharnezhad, A. Shamsi, and S. Nahavandi, "Uncertainty-aware credit-card fraud detection using deep learning", Engineering Applications of Artificial Intelligence, Vol.123, 2023, Article 106248.
  34. Hajek, P., M. Z. Abedin, and U. Sivarajah, "Fraud detection in mobile payment systems using an XGBoost-based framework", Information Systems Frontiers, Vol.25, No.5, 2023, pp. 1985-2003. https://doi.org/10.1007/s10796-022-10346-6
  35. Hajiabdollah, N. and M. Sadeghzadeh, "A review of hybrid deep-learning approaches for credit-card fraud detection", SSRN preprint 5129198, 2024.
  36. HAWK AI, "2023 Fraud & AML Trend Report", 2023, Available at https://www.hawk.ai/.
  37. Han, H., W. Y. Wang, and B. H. Mao , "BorderlineSMOTE: A new over-sampling method in im-balanced data sets learning", Proc. Int. Conf. Intelligent Computing, 2005, pp. 878-887.
  38. He, H., Y. Bai, E. A. Garcia, and S. Li, "ADASYN: Adaptive synthetic sampling approach for imbalanced learning", Proc. 2008 IEEE Int. Joint Conf. Neural Networks, 2008, pp. 1322-1328.
  39. Ibrahim, B. I., D. C. Nicolae, A. Khan, S. I. Ali, and A. Khattak, "VAE-GAN-based zero-shot outlier detection", Proc. 4th Int. Symp. Computer Science & Intelligent Control, 2020, pp. 1-5.
  40. Jin, S., A. RoyChowdhury, H. Jiang, A. Singh, A. Prasad, D. Chakraborty, and E. Learned-Miller, "Unsupervised hard example mining from videos for improved object detection", Proc. European Conf. Computer Vision, 2018, pp. 307-324.
  41. Kertesz, G., "Deep metric learning using negative sampling probability annealing", Sensors, Vol.22, No.19, 2022, Article 7579.
  42. Kingma, D. P. and M. Welling, "Auto-encoding variational Bayes", arXiv preprint arXiv:1312.6114, 2013.
  43. Lakshmi, S. V. S. S. and S. D. Kavilla, "Machine-learning for credit-card fraud detection system", International Journal of Applied Engineering Research, Vol. 13, No. 24, 2018, pp. 16819-16824.
  44. Larsen, A. B. L., S. K. Sønderby, H. Larochelle, and O. Winther, "Auto-encoding beyond pixels using a learned similarity metric", Proc. 33rd Int. Conf. Machine Learning, 2016, pp. 1558-1566.
  45. Liu, W., D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg, "SSD: Single-shot multibox detector", Computer Vision - ECCV 2016, 2016, pp. 21-37.
  46. Mangalathu, S., S. H. Hwang, and J. S. Jeon, "Failure-mode and effects analysis of RC members based on machine-learning-based SHapley Additive exPlanations (SHAP) approach", Engineering Structures, Vol.219, 2020, Article 110927.
  47. Pranavi, N. S. S., T. K. S. S. Sruthi B. J. N. Sirisha, M. S. Nayak, and V. S. G. Thadikemalla, "Credit-card fraud detection using minority oversampling and random-forest technique", Proc. 3rd Int. Conf. Emerging Technology (INCET), 2022, pp. 1-6.
  48. Raphael, B. A., B. G. Adashu, and A. I. Wreford, "Card fraud detection using artificial neural network and multilayer perceptron algorithm", International Journal of Algorithms, Design and Analysis Review, Vol.1, No.1, 2023, pp. 21-30.
  49. A. Roy, J. Sun, R. Mahoney, L. Alonzi, S. Adams, and P. Beling, "Deep-learning detecting fraud in credit-card transactions", Proc. 2018 Systems and Information Engineering Design Symposium, 2018, pp. 129-134.
  50. Santoro, D., T. Ciano, and M. Ferrara, "A comparison between machine and deep-learning models on high-stationarity data", Scientific Reports, Vol.14, No.1, 2024, Article 19409.
  51. Schroff, F., D. Kalenichenko, and J. Philbin, "FaceNet: A unified embedding for face recognition and clustering", Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2015, pp. 815-823.
  52. Shrivastava, A., A. Gupta, and R. Girshick, "Training region-based object detectors with online hard-example mining", Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2016, pp. 761-769.
  53. Soltani, N., M. K. Akbari, and M. S. Javan, "A new user-based model for credit-card fraud detection based on artificial immune system", Proc. 16th CSI Int. Symp. Artificial Intelligence and Signal Processing (AISP 2012), 2012, pp. 29-33.
  54. Son, M. J., S. W. Jung, and E. J. Hwang, "A deep-learning-based over-sampling scheme for imbalanced data classification", KIPS Transactions on Software and Data Engineering, Vol.8, No.7, 2019, pp. 311-316. https://doi.org/10.3745/KTSDE.2019.8.7.311
  55. Talukder, M. A., R. Hossen, M. A. Uddin, M. N. Uddin, and U. K. Acharjee, "Securing transactions: a hybrid dependable ensemble machine-learning model using IHT-LR and grid search", Cybersecurity, Vol.7, No.1, 2024, Article 32.
  56. Thennakoon, A., C. Bhagyani, S. Premadasa, S. Mihiranga, and N. Kuruwitaarachchi, "Real-time credit-card fraud detection using machine learning", Proc. 9th Int. Conf. Cloud Computing, Data Science & Engineering (Confluence), 2019, pp. 488-493.
  57. Van Hulse, J., T. M. Khoshgoftaar, and A. Napolitano, "Experimental perspectives on learning from imbalanced data", Proc. 24th Int. Conf. Machine Learning, 2007, pp. 935-942.
  58. Vorobyev, I. and A. Krivitskaya, "Reducing false positives in bank anti-fraud systems based on rule induction in distributed tree-based models", Computers & Security, Vol.120, 2022, Article 102786.
  59. Xuan, H., A. Stylianou, X. Liu, and R. Pless, "Hard negative examples are hard, but useful", Computer Vision - ECCV 2020, 2020, pp. 126-142.
  60. Zhang, F., G. Liu, Z. Li, C. Yan, and C. Jiang, "GMM-based undersampling and its application for credit-card fraud detection", Proc. 2019 Int. Joint Conf. Neural Networks (IJCNN), 2019, pp. 1-8
  61. Zhou, H., G. Sun, S. Fu, L. Wang, J. Hu, and Y. Gao, "Internet financial fraud detection based on a distributed big-data approach with node2vec", IEEE Access, Vol.9, 2021, pp. 43378-43386. https://doi.org/10.1109/ACCESS.2021.3062467