Clustering Observations for Detecting Multiple Outliers in Regression Models

Seo, Han-Son;Yoon, Min

  • Received : 2012.03.02
  • Accepted : 2012.04.17
  • Published : 2012.06.30


Detecting outliers in a linear regression model eventually fails when similar observations are classified differently in a sequential process. In such circumstances, identifying clusters and applying certain methods to the clustered data can prevent a failure to detect outliers and is computationally efficient due to the reduction of data. In this paper, we suggest to implement a clustering procedure for this purpose and provide examples that illustrate the suggested procedure applied to the Hadi-Simonoff (1993) method, reverse Hadi-Simonoff method, and Gentleman-Wilk (1975) method.


Clustering;linear regression model;outliers;regression diagnostics


  1. Ahn, B. J. and Seo, H. S. (2011). Outlier detection using dynamic plots, The Korean Journal of Applied Statistics, 24, 979-986.
  2. Atkinson, A. C. (1994). Fast very robust methods for the detection of multiple outliers, Journal of the American Statistical Association, 89, 1329-1339.
  3. Atkinson, A. C., Riani, M. and Cerioli, A. (2004). Exploring Multivariate Data with The Forward Search, Springer, New York.
  4. Cormack, R. M. (1971). A review of classification, Journal of the Royal Statistical Society, Series A, 134, 321-367.
  5. Gentleman, J. F. and Wilk, M. B. (1975). Detecting outliers.II. supplementing the direct analysis of residuals, Biometrics, 31, 387-410.
  6. Gray, J. B. and Ling, R. F. (1984). K-clustering as a detection tool for influential subsets in regression, Technometrics, 26, 305-318.
  7. Hadi, A. S. and Simonoff, J. S. (1993). Procedures for the identification of multiple outliers in linear models, Journal of the American Statistical Association, 88, 1264-1272.
  8. Jajo, N. K. (2005). A review of Robust regression an diagnostic procedures in linear regression, Acta Mathematicae Applicatae Sinica, 21, 209-224.
  9. Kaufman, L. and Rousseeuw, P. J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis, Wiley, New York.
  10. Kianifard, F. and Swallow, W. H. (1989). Using recursive residuals, calculated on adaptively-ordered observations, to identify outliers in linear regression, Biometrics, 45, 571-585.
  11. Kianifard, F. and Swallow, W. H. (1990). A Monte Carlo comparison of five procedures for identifying outliers in linear regression, Communications in Statistics, 19, 1913-1938.
  12. Ling, R. F. (1972). On the theory and construction of k-clusters, Computer Journal, 15, 326-332.
  13. Marasinghe, M. G. (1985). A multistage procedure for detecting several outliers in linear regression, Technometrics, 27, 395-399.
  14. Paul, S. R. and Fung, K. Y. (1991). A generalized extreme studentized residual multiple-outlier-detection procedure in linear regression, Technometrics, 33, 339-348.
  15. Pena, D. and Yohai, V. J. (1999). A fast procedure for outlier diagnostics in linear regression problems, Journal of the American Statistical Association, 94, 434-445.
  16. Rousseeuw, P. J. (1984). Least median of squares regression, Journal of the American Statistical Association, 79, 871-880.


Supported by : Konkuk University