Clustering Observations for Detecting Multiple Outliers in Regression Models

- Journal title : Korean Journal of Applied Statistics
- Volume 25, Issue 3, 2012, pp.503-512
- Publisher : The Korean Statistical Society
- DOI : 10.5351/KJAS.2012.25.3.503

Title & Authors

Clustering Observations for Detecting Multiple Outliers in Regression Models

Seo, Han-Son; Yoon, Min;

Seo, Han-Son; Yoon, Min;

Abstract

Detecting outliers in a linear regression model eventually fails when similar observations are classified differently in a sequential process. In such circumstances, identifying clusters and applying certain methods to the clustered data can prevent a failure to detect outliers and is computationally efficient due to the reduction of data. In this paper, we suggest to implement a clustering procedure for this purpose and provide examples that illustrate the suggested procedure applied to the Hadi-Simonoff (1993) method, reverse Hadi-Simonoff method, and Gentleman-Wilk (1975) method.

Keywords

Clustering;linear regression model;outliers;regression diagnostics;

Language

Korean

References

1.

Ahn, B. J. and Seo, H. S. (2011). Outlier detection using dynamic plots, The Korean Journal of Applied Statistics, 24, 979-986.

2.

Atkinson, A. C. (1994). Fast very robust methods for the detection of multiple outliers, Journal of the American Statistical Association, 89, 1329-1339.

3.

Atkinson, A. C., Riani, M. and Cerioli, A. (2004). Exploring Multivariate Data with The Forward Search, Springer, New York.

4.

Cormack, R. M. (1971). A review of classification, Journal of the Royal Statistical Society, Series A, 134, 321-367.

5.

Gentleman, J. F. and Wilk, M. B. (1975). Detecting outliers.II. supplementing the direct analysis of residuals, Biometrics, 31, 387-410.

6.

Gray, J. B. and Ling, R. F. (1984). K-clustering as a detection tool for influential subsets in regression, Technometrics, 26, 305-318.

7.

Hadi, A. S. and Simonoff, J. S. (1993). Procedures for the identification of multiple outliers in linear models, Journal of the American Statistical Association, 88, 1264-1272.

8.

Jajo, N. K. (2005). A review of Robust regression an diagnostic procedures in linear regression, Acta Mathematicae Applicatae Sinica, 21, 209-224.

9.

Kaufman, L. and Rousseeuw, P. J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis, Wiley, New York.

10.

Kianifard, F. and Swallow, W. H. (1989). Using recursive residuals, calculated on adaptively-ordered observations, to identify outliers in linear regression, Biometrics, 45, 571-585.

11.

Kianifard, F. and Swallow, W. H. (1990). A Monte Carlo comparison of five procedures for identifying outliers in linear regression, Communications in Statistics, 19, 1913-1938.

12.

13.

Marasinghe, M. G. (1985). A multistage procedure for detecting several outliers in linear regression, Technometrics, 27, 395-399.

14.

Paul, S. R. and Fung, K. Y. (1991). A generalized extreme studentized residual multiple-outlier-detection procedure in linear regression, Technometrics, 33, 339-348.

15.

Pena, D. and Yohai, V. J. (1999). A fast procedure for outlier diagnostics in linear regression problems, Journal of the American Statistical Association, 94, 434-445.