Identification of Regression Outliers Based on Clustering of LMS-residual Plots

Title & Authors
Identification of Regression Outliers Based on Clustering of LMS-residual Plots
Kim, Bu-Yong; Oh, Mi-Hyun;

Abstract
An algorithm is proposed to identify multiple outliers in linear regression. It is based on the clustering of residuals from the least median of squares estimation. A cut-height criterion for the hierarchical cluster tree is suggested, which yields the optimal clustering of the regression outliers. Comparisons of the effectiveness of the procedures are performed on the basis of the classic data and artificial data sets, and it is shown that the proposed algorithm is superior to the one that is based on the least squares estimation. In particular, the algorithm deals very well with the masking and swamping effects while the other does not.
Keywords
Language
Korean
Cited by
1.
V-mask Type Criterion for Identification of Outliers In Logistic Regression,;

Communications for Statistical Applications and Methods, 2005. vol.12. 3, pp.625-634
2.
로버스트주성분회귀에서 최적의 주성분선정을 위한 기준,김부용;

Communications for Statistical Applications and Methods, 2011. vol.18. 6, pp.761-770
1.
A Criterion for the Selection of Principal Components in the Robust Principal Component Regression, Communications for Statistical Applications and Methods, 2011, 18, 6, 761
References
1.
Basset, Jr. G. W.(1991). Equivariant, monotonic, 50% breakdown estimators, The American Statistician, Vol. 45, 135-137

2.
Belsely, D. A, Kuh, E. and Welsh, R E.(1980). Regression Diagnostics: lrifluential Data and Source of Collinearity. Wiley, New York

3.
Cook, R D. and Weisberg, S.(1980). Characterizations of an empirical influence function for detecting influential cases in regression, Technometrics, Vol. 22, 495-508

4.
Everitt, B. S.(1993). Cluster Analysis, Halsted Press, New York

5.
Hadi, A S. and Simonoff, J. S.(1993). Procedures for the identification of multiple outliers in linear models, journal of the American Statistical Association, Vol. 88, 1264-1272

6.
Hartigan, J. A(1975). Clustering Algorithms, Wiley, New York

7.
Hawkins, D. M., Bradu, D. and Kass, G. V.(1984). Location of several outliers in multiple regression data using elemental sets, Technometrics, Vol. 26, 197-208

8.
Kianifard, F. and Swallow, W. H.(1990). A Monte Carlo comparison of five procedures for identifying outliers in linear regression, Commun. Statist.-Theory Meth, Vol. 19, 1913-1938

9.
Kim, B. Y.(1996).$L_{\infty}$-estimation based algorithm for the least median of squares estimator, The Korean Communications in Statistics, Vol. 3, 299-307

10.
Kim, B. Y. and Kim, H. Y(2002). A hybrid algorithm for identifying multiple outliers in linear regression, The Korean Communication in Statistics, Vol. 9, 291-304

11.
Marasinghe, M. G.(1985). A multistage procedure for detecting several outliers in linear regression, Technometrics, Vol. 27, 395-399

12.
Mojena, R(1977). Hierarchical grouping methods and stopping rules: an evaluation, Computer journal, Vol. 20, 359-363

13.
Rousseeuw, P. J.(1984). Least median of squares regression, journal of the American Statistical Association, Vol. 79, 871-880

14.
Rousseeuw, P. J. and Leroy, A M.(1987). Robust Regression and Outlier Detection, Wiley-Interscience, New York

15.
Rousseeuw, P. J. and Zomeren, B. C.(1990). Unmasking multivariate outliers and leverage points, journal of the American Statistical Association, Vol. 85, 633-639

16.
Sebert, D. M., Montgomery, D. C. and RoIlier, D. A(1998). A clustering algorithm for identifying multiple outliers in linear regression, Computational Statistics & Data Analysis, Vol. 27, 461-484