DOI QR코드

DOI QR Code

Support Vector Machine Algorithm for Imbalanced Data Learning

불균형 데이터 학습을 위한 지지벡터기계 알고리즘

  • 김광성 (현대정보기술) ;
  • 황두성 (단국대학교 공학대학 컴퓨터과학)
  • Received : 2010.05.07
  • Accepted : 2010.06.25
  • Published : 2010.07.31

Abstract

This paper proposes an improved SMO solving a quadratic optmization problem for class imbalanced learning. The SMO algorithm is aproporiate for solving the optimization problem of a support vector machine that assigns the different regularization values to the two classes, and the prosoposed SMO learning algorithm iterates the learning steps to find the current optimal solutions of only two Lagrange variables selected per class. The proposed algorithm is tested with the UCI benchmarking problems and compared to the experimental results of the SMO algorithm with the g-mean measure that considers class imbalanced distribution for gerneralization performance. In comparison to the SMO algorithm, the proposed algorithm is effective to improve the prediction rate of the minority class data and could shorthen the training time.

본 논문에서는 클래스 불균형 학습을 위한 이차 최적화 문제의 해를 구하는 개선된 SMO 학습 알고리즘을 제안한다. 클래스에 서로 다른 정규화 값이 부여되는 지지벡터기계의 최적화 문제의 구현에 SMO 알고리즘이 적합하며, 제안된 알고리즘은 서로 다른 클래스에서 선택된 두 라그랑지 변수의 현재 해를 구하는 학습 단계를 반복한다. 제안된 학습 알고리즘은 UCI 벤치마킹 문제에서 테스트되어 클래스 불균형 분포를 반영하는 g-mean 평가를 이용한 일반화 성능이 SMO 알고리즘과 비교되었다. 실험 결과에서 제안된 알고리즘은 SMO에 비해 적은 클래스 데이터의 예측율을 높이고 학습시간을 단축시킬 수 있다.

Keywords

References

  1. Japkowicz N. and Stephen S., "The Class Imbalance Problem: A Systematic Study," Intelligent Data Analysis, Vol. 6, No. 5, pp. 429-450, November 2002.
  2. Ronaldo C. Prati, Gustavo E. A. P. A. Batista and Maria Carolina Monard, "Class Imbalances versus Class Overlapping: An Analysis of a Learning System Behavior," MICAI, pp, 312-321, 2004.
  3. Jie Gu, Yuanbing Zhou and Xianqiang Zuo, "Making Class Bias Useful: A Strategy of Learning from Imbalanced Data," Intelligent Data Engineering and Automated Learning(IDEAL), pp. 287-295, 2007.
  4. Maciej A. Mazurowski, Piotr A. Habas, Jacek M. Zurada, Joseph Y. Lo, Jay A. Baker and Georgia D. Tourassi, "Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance," Neural Networks, Vol. 21, No. 2-3, pp. 427-436, 2008. https://doi.org/10.1016/j.neunet.2007.12.031
  5. Yuchun Tang, Sven Krasser, Paul Judge and Yan-Qing Zhang, "Fast and Effective Spam Sender Detection with Granular SVM on Highly Imbalanced Mail Server Behavior Data," Collaborative Computing: Networking, Applications and Worksharing, pp. 1-6, 2006.
  6. Tie-Yan Liu, Yiming Yang, Hao Wan, Hua-Jun Zeng, Zheng Chen and Wei-Ying Ma, "Support Vector Machines Classification with A Very Large-scale Taxonomy," SIGKDD Explorations, Vol. 7, No. 1, 2005.
  7. Yuchun Tang, Yan-Qing Zhang, N. V. Chawla, and S. Krasser, "SVMs Modeling for Highly Imbalanced Classification," IEEE Transactions on Systems, Man, and Cybernetics, Vol. 39, No. 1, pp. 281-288, 2009. https://doi.org/10.1109/TSMCB.2008.2002909
  8. Yingdong Zhao, Clemencia Pinilla, Danila Valmori, Roland Martin, and Richard Simon, "Application of support vector machines for T-cell epitopes prediction," Bioinformatics Vol. 19, No. 15 2003.
  9. Ryan Rifkin and Aldebaro Klautau, "In Defense of One-vs-All Classification," Journal of Machine Learning Research, Vol. 5, pp. 101-141, 2004.
  10. John C. Platt, "Fast training of support vector machines using sequential minimal optimization," Advances in kernel methods: support vector learning, pp. 185-208, MIT Press Cambridge, 1999.
  11. Gustavo E. A. P. A. Batista, Ronaldo C. Prati and Maria Carolina Monard, "A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data," SIGKDD Explorations, Vol. 6, 2004.
  12. Gary M. Weiss and Foster J. Provost, "Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction." J. Artif. Intell. Res.(JAIR), Vol. 19, pp. 315-354, 2003.
  13. Vicente Garca and Alberto Mollineda, "An Empirical Study of the Behavior of Classifiers on Imbalanced and Overlapped Data Sets," CIARP, pp. 397-406, 2007.
  14. Veropoulous, C. Campbell, N. Cristianini, "Controlling the Sensitivity of Support Vector Machines," Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, 1999.
  15. Chan-Yun Yang, Jr-Syu Yang, Jian-Jun Wang, "Margin calibration in SVM class-imbalanced learning," Neurocomputing, Vol. 73, pp.397-411, 2009. https://doi.org/10.1016/j.neucom.2009.08.006
  16. Gang Wu, Edward Y. Chang, "Class-Boundary Alignment for Imbalanced Dataset Learning," ICML, 2003.
  17. Rehan Akbani, Stephen Kwek, and Nathalie Japkowicz, "Applying Support Vector K. Machines to Imbalanced Datasets," Proceedings of 15th European Conference on Machine Learning, pp. 39-50, 2004.
  18. Nello Cristianini and John Showe-Taylor, "An Introduction to Support Vector Machines and other kernel-based learning methods," Cambridge University Press, 2000.
  19. L. Bottou and C.-J. Lin. "Support Vector Machine Solvers," In Large Scale Kernel Machines, 1-28, MIT Press, 2007.
  20. Ian H. Witten and Eibe Frank, "Data Mining: Practical Machine Learning Tools and Techniques," 2nd edition, Elsevier, 2005.
  21. UCI Machine Learning Repository, http://archive.ics.uci.edu/ml/

Cited by

  1. 센서스 정보 및 전력 부하를 활용한 전력 수요 예측 vol.18, pp.3, 2013, https://doi.org/10.9723/jksiis.2013.18.3.035
  2. 적외선 영상에서의 시계열 특징 추출을 이용한 Gunnery 분류 기법 연구 vol.19, pp.10, 2014, https://doi.org/10.9708/jksci.2014.19.10.043
  3. 적외선 영상에서 변위추정 및 SURF 특징을 이용한 표적 탐지 분류 기법 vol.19, pp.11, 2014, https://doi.org/10.9708/jksci.2014.19.11.043
  4. 클래스 영역의 다차원 구 생성에 의한 프로토타입 기반 분류 vol.20, pp.2, 2010, https://doi.org/10.9708/jksci.2015.20.2.021