DOI QR코드

DOI QR Code

A sequential outlier detecting method using a clustering algorithm

군집 알고리즘을 이용한 순차적 이상치 탐지법

Seo, Han Son;Yoon, Min
서한손;윤민

  • Received : 2016.03.04
  • Accepted : 2016.04.16
  • Published : 2016.06.30

Abstract

Outlier detection methods without performing a test often do not succeed in detecting multiple outliers because they are structurally vulnerable to a masking effect or a swamping effect. This paper considers testing procedures supplemented to a clustering-based method of identifying the group with a minority of the observations as outliers. One of general steps is performing a variety of t-test on individual outlier-candidates. This paper proposes a sequential procedure for searching for outliers by changing cutoff values on a cluster tree and performing a test on a set of outlier-candidates. The proposed method is illustrated and compared to existing methods by an example and Monte Carlo studies.

Keywords

clustering;linear regression model;outlier test;sequential procedure

References

  1. Hadi, A. S. and Simonoff, J. S. (1993). Procedures for the identification of multiple outliers in linear models, Journal of the American Statistical Association, 88, 1264-1272. https://doi.org/10.1080/01621459.1993.10476407
  2. Kianifard, F. and Swallow, W. H. (1989). Using recursive residuals, calculated on adaptive-ordered observations, to identify outliers in linear regression, Biometrics, 45, 571-585. https://doi.org/10.2307/2531498
  3. Kianifard, F. and Swallow, W. H. (1996). A review of the development and application of recursive residuals in linear models, Journal of the American Statistical Association, 91, 391-400. https://doi.org/10.1080/01621459.1996.10476700
  4. Kim, S. S. and Krzanowski, W. J. (2007). Detecting multiple outliers in linear regression using a cluster method combined with graphical visualization, Computational Statistics, 22, 109-119. https://doi.org/10.1007/s00180-007-0026-3
  5. Mojena, R. (1977). Hierarchical grouping methods and stopping rules: an evaluation, The Computer Journal, 20, 359-363. https://doi.org/10.1093/comjnl/20.4.359
  6. Pena, D. and Yohai, V. J. (1995). The detection of influential subsets in linear regression by using an influence matrix, Journal of the Royal Statistical Society, Series B, 57, 145-156.
  7. Rousseeuw, P. J. and Leroy, A. M. (1987). Robust Regression and Outlier Detection, John Wiley, New York.
  8. Sebert, D. M., Montgomery, D. C., and Rollier, D. (1998). A clustering algorithm for identifying multiple outliers in linear regression, Computational Statistics and Data Analysis, 27, 461-484. https://doi.org/10.1016/S0167-9473(98)00021-8
  9. Seo, H. S. and Yoon, M. (2014). A test on a specific set of outlier candidates in a linear model, The Korean Journal of Applied Statistics, 27, 307-315. https://doi.org/10.5351/KJAS.2014.27.2.307

Cited by

  1. An on-line detection method for outliers of dynamic unstable measurement data pp.1573-7543, 2017, https://doi.org/10.1007/s10586-017-1458-3

Acknowledgement

Supported by : 건국대학교