DOI QR코드

DOI QR Code

Outlier Detection Using Support Vector Machines

서포트벡터 기계를 이용한 이상치 진단

  • Seo, Han-Son (Department of Applied Statistics, Konkuk University) ;
  • Yoon, Min (Department of Statistics, Pukyong National University)
  • 서한손 (건국대학교 응용통계학과) ;
  • 윤민 (부경대학교 통계학과)
  • Received : 20110200
  • Accepted : 20110300
  • Published : 2011.03.31

Abstract

In order to construct approximation functions for real data, it is necessary to remove the outliers from the measured raw data before constructing the model. Conventionally, visualization and maximum residual error have been used for outlier detection, but they often fail to detect outliers for nonlinear functions with multidimensional input. Although the standard support vector regression based outlier detection methods for nonlinear function with multidimensional input have achieved good performance, they have practical issues in computational cost and parameter adjustments. In this paper we propose a practical approach to outlier detection using support vector regression that reduces computational time and defines outlier threshold suitably. We apply this approach to real data examples for validity.

실생활에서 얻어지는 자료에서 근사함수를 구성하기 위하여 모델링을 하기 전에 측정된 원자료로부터 이상치를 제거하는 것이 필요하다. 기존의 이상치 진단의 방법들은 시각화나 최대 잔차들을 이용해왔다. 그러나 종종 다차원의 입력자료를 가지는 비선형함수에 대한 이상치 진단은 좋지 않은 결과를 얻었다. 다차원 입력자료를 갖는 비선형함수에 대한 전형적인서포트 벡터 회귀에 기초한 이상치 진단방법들은 좋은 수행능력을 얻어지지만, 계산비용이나 모수들의 보정 등의 실질적인 문제점들을 가지고 있다. 본 논문에서 계산비용을 감소하고 이상치의 문턱을 적절히 정의하는 서포트 벡터회귀를 이용한 이상치 진단의 실질적인방법을 제안한다. 제안한 방법을 실제자료들에 적용하여 타당성을 보일 것이다.

Keywords

References

  1. Brownlee, K. A. (1965). Statistical Theory and Methodology in Science and Engineering, 2nd Ed., Wiley.
  2. Dufrenois, F., Colliez, J. and Hamad, D. (2009). Bounded influence support vector regression for robust single-model estimation, IEEE Transactions on Neural Networks, 20, 1689–1705. https://doi.org/10.1109/TNN.2009.2024202
  3. Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning – Data Mining, Inference, and Prediction, 2nd Ed., Springer, New York.
  4. Jordaan, E. M. and Smits, G. F. (2004). Robust outlier detection using SVM regression, In Proceedings of International Joint Conference on Neural Networks, 2017–2022. https://doi.org/10.1109/IJCNN.2004.1380925
  5. Lahiri, S. K. and Ghanta, K. C. (2009). Hybrid support vector regression and genetic algorithm technique–Novel approach in process modeling, Chemical Product and Process Modeling, 4, Article 4.
  6. Mangasarian, O. L. (1969). Nonlinear Programming, McGraw-Hill, New York.
  7. Nakayama, H., Yun, Y. B. and Yoon, M. (2009). Sequential Approximate Multiobjective Optimization Using Computational Intelligence, Springer-Verlag, Berlin Heidelberg.
  8. Pell, R. J. (2000). Multiple outlier detection for multivariate calibration using robust statistical technique, Chemometrics and Intelligent Laboratory Systems, 52, 87–104. https://doi.org/10.1016/S0169-7439(00)00082-4
  9. Rousseeuw, P. J. and Baxter, M. A. (1987). Robust Regression and Outlier Detection, John Wiley & Sons, New York.
  10. Vapnik, V. N. (1999). The Nature of Statistical Learning Theory, 2nd Ed., Springer-Verlag, New York.

Cited by

  1. A Study of Outlier Detection Using the Mixture of Extreme Distributions Based on Deep-Sea Fishery Data vol.28, pp.5, 2015, https://doi.org/10.5351/KJAS.2015.28.5.847
  2. Outlier detection using Grubb test and Cochran test in clinical data vol.23, pp.4, 2012, https://doi.org/10.7465/jkdi.2012.23.4.657