- Volume 25 Issue 3
Detecting outliers in a linear regression model eventually fails when similar observations are classified differently in a sequential process. In such circumstances, identifying clusters and applying certain methods to the clustered data can prevent a failure to detect outliers and is computationally efficient due to the reduction of data. In this paper, we suggest to implement a clustering procedure for this purpose and provide examples that illustrate the suggested procedure applied to the Hadi-Simonoff (1993) method, reverse Hadi-Simonoff method, and Gentleman-Wilk (1975) method.
Clustering;linear regression model;outliers;regression diagnostics
- Ahn, B. J. and Seo, H. S. (2011). Outlier detection using dynamic plots, The Korean Journal of Applied Statistics, 24, 979-986. https://doi.org/10.5351/KJAS.2011.24.5.979
- Atkinson, A. C. (1994). Fast very robust methods for the detection of multiple outliers, Journal of the American Statistical Association, 89, 1329-1339. https://doi.org/10.1080/01621459.1994.10476872
- Atkinson, A. C., Riani, M. and Cerioli, A. (2004). Exploring Multivariate Data with The Forward Search, Springer, New York.
- Cormack, R. M. (1971). A review of classification, Journal of the Royal Statistical Society, Series A, 134, 321-367. https://doi.org/10.2307/2344237
- Gentleman, J. F. and Wilk, M. B. (1975). Detecting outliers.II. supplementing the direct analysis of residuals, Biometrics, 31, 387-410. https://doi.org/10.2307/2529428
- Gray, J. B. and Ling, R. F. (1984). K-clustering as a detection tool for influential subsets in regression, Technometrics, 26, 305-318.
- Hadi, A. S. and Simonoff, J. S. (1993). Procedures for the identification of multiple outliers in linear models, Journal of the American Statistical Association, 88, 1264-1272. https://doi.org/10.1080/01621459.1993.10476407
- Jajo, N. K. (2005). A review of Robust regression an diagnostic procedures in linear regression, Acta Mathematicae Applicatae Sinica, 21, 209-224. https://doi.org/10.1007/s10255-005-0230-2
- Kaufman, L. and Rousseeuw, P. J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis, Wiley, New York.
- Kianifard, F. and Swallow, W. H. (1989). Using recursive residuals, calculated on adaptively-ordered observations, to identify outliers in linear regression, Biometrics, 45, 571-585. https://doi.org/10.2307/2531498
- Kianifard, F. and Swallow, W. H. (1990). A Monte Carlo comparison of five procedures for identifying outliers in linear regression, Communications in Statistics, 19, 1913-1938. https://doi.org/10.1080/03610929008830300
- Ling, R. F. (1972). On the theory and construction of k-clusters, Computer Journal, 15, 326-332. https://doi.org/10.1093/comjnl/15.4.326
- Marasinghe, M. G. (1985). A multistage procedure for detecting several outliers in linear regression, Technometrics, 27, 395-399. https://doi.org/10.1080/00401706.1985.10488078
- Paul, S. R. and Fung, K. Y. (1991). A generalized extreme studentized residual multiple-outlier-detection procedure in linear regression, Technometrics, 33, 339-348. https://doi.org/10.1080/00401706.1991.10484839
- Pena, D. and Yohai, V. J. (1999). A fast procedure for outlier diagnostics in linear regression problems, Journal of the American Statistical Association, 94, 434-445.
- Rousseeuw, P. J. (1984). Least median of squares regression, Journal of the American Statistical Association, 79, 871-880. https://doi.org/10.1080/01621459.1984.10477105
Supported by : Konkuk University