JOURNAL BROWSE
Search
Advanced SearchSearch Tips
Identification of Regression Outliers Based on Clustering of LMS-residual Plots
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
Identification of Regression Outliers Based on Clustering of LMS-residual Plots
Kim, Bu-Yong; Oh, Mi-Hyun;
  PDF(new window)
 Abstract
An algorithm is proposed to identify multiple outliers in linear regression. It is based on the clustering of residuals from the least median of squares estimation. A cut-height criterion for the hierarchical cluster tree is suggested, which yields the optimal clustering of the regression outliers. Comparisons of the effectiveness of the procedures are performed on the basis of the classic data and artificial data sets, and it is shown that the proposed algorithm is superior to the one that is based on the least squares estimation. In particular, the algorithm deals very well with the masking and swamping effects while the other does not.
 Keywords
regression outlier;robust residual;clustering;masking;swamping;
 Language
Korean
 Cited by
1.
V-mask Type Criterion for Identification of Outliers In Logistic Regression,;

Communications for Statistical Applications and Methods, 2005. vol.12. 3, pp.625-634 crossref(new window)
2.
로버스트주성분회귀에서 최적의 주성분선정을 위한 기준,김부용;

Communications for Statistical Applications and Methods, 2011. vol.18. 6, pp.761-770 crossref(new window)
1.
A Criterion for the Selection of Principal Components in the Robust Principal Component Regression, Communications for Statistical Applications and Methods, 2011, 18, 6, 761  crossref(new windwow)
 References
1.
Basset, Jr. G. W.(1991). Equivariant, monotonic, 50% breakdown estimators, The American Statistician, Vol. 45, 135-137 crossref(new window)

2.
Belsely, D. A, Kuh, E. and Welsh, R E.(1980). Regression Diagnostics: lrifluential Data and Source of Collinearity. Wiley, New York

3.
Cook, R D. and Weisberg, S.(1980). Characterizations of an empirical influence function for detecting influential cases in regression, Technometrics, Vol. 22, 495-508 crossref(new window)

4.
Everitt, B. S.(1993). Cluster Analysis, Halsted Press, New York

5.
Hadi, A S. and Simonoff, J. S.(1993). Procedures for the identification of multiple outliers in linear models, journal of the American Statistical Association, Vol. 88, 1264-1272 crossref(new window)

6.
Hartigan, J. A(1975). Clustering Algorithms, Wiley, New York

7.
Hawkins, D. M., Bradu, D. and Kass, G. V.(1984). Location of several outliers in multiple regression data using elemental sets, Technometrics, Vol. 26, 197-208 crossref(new window)

8.
Kianifard, F. and Swallow, W. H.(1990). A Monte Carlo comparison of five procedures for identifying outliers in linear regression, Commun. Statist.-Theory Meth, Vol. 19, 1913-1938 crossref(new window)

9.
Kim, B. Y.(1996).$ L_{\infty}$-estimation based algorithm for the least median of squares estimator, The Korean Communications in Statistics, Vol. 3, 299-307

10.
Kim, B. Y. and Kim, H. Y(2002). A hybrid algorithm for identifying multiple outliers in linear regression, The Korean Communication in Statistics, Vol. 9, 291-304 crossref(new window)

11.
Marasinghe, M. G.(1985). A multistage procedure for detecting several outliers in linear regression, Technometrics, Vol. 27, 395-399 crossref(new window)

12.
Mojena, R(1977). Hierarchical grouping methods and stopping rules: an evaluation, Computer journal, Vol. 20, 359-363 crossref(new window)

13.
Rousseeuw, P. J.(1984). Least median of squares regression, journal of the American Statistical Association, Vol. 79, 871-880 crossref(new window)

14.
Rousseeuw, P. J. and Leroy, A M.(1987). Robust Regression and Outlier Detection, Wiley-Interscience, New York

15.
Rousseeuw, P. J. and Zomeren, B. C.(1990). Unmasking multivariate outliers and leverage points, journal of the American Statistical Association, Vol. 85, 633-639 crossref(new window)

16.
Sebert, D. M., Montgomery, D. C. and RoIlier, D. A(1998). A clustering algorithm for identifying multiple outliers in linear regression, Computational Statistics & Data Analysis, Vol. 27, 461-484 crossref(new window)