• Title, Summary, Keyword: Cook's distance

Search Result 20, Processing Time 0.031 seconds

A Note on Cook's Distance in the Multivariate Linear Model

  • Bae, Whasoo;Hwang, Hyunmi;Kim, Choongrak
    • Communications for Statistical Applications and Methods
    • /
    • v.20 no.1
    • /
    • pp.23-28
    • /
    • 2013
  • We propose a version of Cook's distance (called local distance) in the multivariate linear model. The proposed version is a matrix, while the existing version of Cook's distance (called global distance) is a scalar. The existing Cook's distance is the trace of the proposed Cook's distance. In addition, we argue that the proposed Cook's distance has a more natural extension of the Cook's distance in the univariate linear model than the existing Cook's distance. An illustrative example based on a real data set is given.

Cutoff Values for Cook's Distance

  • Choongrak Kim
    • Communications for Statistical Applications and Methods
    • /
    • v.3 no.2
    • /
    • pp.13-19
    • /
    • 1996
  • Cook's distance(Cook, 1997) is one of the most widely used influence measures to assess the influence of single observations or sets of observations in the linear regression model. After computing Cook(1977) suggested guidelines based on a confidence ellipsoid for the regression parameter ${\beta}$. In this paper, we suggest cutoff values for Cook's distance cia Monte Carlo simulation, and compare them with Cook's guidelines. An example based on a real data set is given.

  • PDF

A cautionary note on the use of Cook's distance

  • Kim, Myung Geun
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.3
    • /
    • pp.317-324
    • /
    • 2017
  • An influence measure known as Cook's distance has been used for judging the influence of each observation on the least squares estimate of the parameter vector. The distance does not reflect the distributional property of the change in the least squares estimator of the regression coefficients due to case deletions: the distribution has a covariance matrix of rank one and thus it has a support set determined by a line in the multidimensional Euclidean space. As a result, the use of Cook's distance may fail to correctly provide information about influential observations, and we study some reasons for the failure. Three illustrative examples will be provided, in which the use of Cook's distance fails to give the right information about influential observations or it provides the right information about the most influential observation. We will seek some reasons for the wrong or right provision of information.

A Comparison of Influence Diagnostics in Linear Mixed Models

  • Lee, Jang-Taek
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.1
    • /
    • pp.125-134
    • /
    • 2003
  • Standard estimation methods for linear mixed models are sensitive to influential observations. However, tools and concepts for linear mixed model diagnostics are rudimentary until now and research is heavily demanded in linear mixed models. In this paper, we consider two diagnostics to evaluate the effects of individual observations in the estimation of fixed effects for linear mixed models. Those are Cook's distance and COVRATIO. Results of our limited simulation study suggest that the Cook's distance is not good statistical quantity in linear mixed models. Also calibration point for COVRATIO seems to be quite conservative.

The local influence of LIU type estimator in linear mixed model

  • Zhang, Lili;Baek, Jangsun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.2
    • /
    • pp.465-474
    • /
    • 2015
  • In this paper, we study the local influence analysis of LIU type estimator in the linear mixed models. Using the method proposed by Shi (1997), the local influence of LIU type estimator in three disturbance models are investigated respectively. Furthermore, we give the generalized Cook's distance to assess the influence, and illustrate the efficiency of the proposed method by example.

A Study on Sensitivity Analysis in Ridge Regression (능형 회귀에서의 민감도 분석에 관한 연구)

  • Kim, Soon-Kwi
    • Journal of the Korean Society for Quality Management
    • /
    • v.19 no.1
    • /
    • pp.1-15
    • /
    • 1991
  • In this paper, we discuss and review various measures which have been presented for studying outliers, high-leverage points, and influential observations when ridge regression estimation is adopted. We derive the influence function for ${\underline{\hat{\beta}}}\small{R}$, the ridge regression estimator, and discuss its various finite sample approximations when ridge regression is postulated. We also study several diagnostic measures such as Welsh-Kuh's distance, Cook's distance etc.

  • PDF

Influential Points in GLMs via Backwards Stepping

  • Jeong, Kwang-Mo;Oh, Hae-Young
    • Communications for Statistical Applications and Methods
    • /
    • v.9 no.1
    • /
    • pp.197-212
    • /
    • 2002
  • When assessing goodness-of-fit of a model, a small subset of deviating observations can give rise to a significant lack of fit. It is therefore important to identify such observations and to assess their effects on various aspects of analysis. A Cook's distance measure is usually used to detect influential observation. But it sometimes is not fully effective in identifying truly influential set of observations because there may exist masking or swamping effects. In this paper we confine our attention to influential subset In GLMs such as logistic regression models and loglinear models. We modify a backwards stepping algorithm, which was originally suggested for detecting outlying cells in contingency tables, to detect influential observations in GLMs. The algorithm consists of two steps, the identification step and the testing step. In identification step we Identify influential observations based on influencial measures such as Cook's distances. On the other hand in testing step we test the subset of identified observations to be significant or not Finally we explain the proposed method through two types of dataset related to logistic regression model and loglinear model, respectively.

Outlier Detection and Treatment for the Conversion of Chemical Oxygen Demand to Total Organic Carbon (화학적산소요구량의 총유기탄소 변환을 위한 이상자료의 탐지와 처리)

  • Cho, Beom Jun;Cho, Hong Yeon;Kim, Sung
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.26 no.4
    • /
    • pp.207-216
    • /
    • 2014
  • Total organic carbon (TOC) is an important indicator used as an direct biological index in the research field of the marine carbon cycle. It is possible to produce the sufficient TOC estimation data by using the Chemical Oxygen Demand(COD) data because the available TOC data is relatively poor than the COD data. The outlier detection and treatment (removal) should be carried out reasonably and objectively because the equation for a COD-TOC conversion is directly affected the TOC estimation. In this study, it aims to suggest the optimal regression model using the available salinity, COD, and TOC data observed in the Korean coastal zone. The optimal regression model is selected by the comparison and analysis on the changes of data numbers before and after removal, variation coefficients and root mean square (RMS) error of the diverse detection methods of the outlier and influential observations. According to research result, it is shown that a diagnostic case combining SIQR (Semi - Inter-Quartile Range) boxplot and Cook's distance method is most suitable for the outlier detection. The optimal regression function is estimated as the TOC(mg/L) = $0.44{\cdot}COD(mg/L)+1.53$, then determination coefficient is showed a value of 0.47 and RMS error is 0.85 mg/L. The RMS error and the variation coefficients of the leverage values are greatly reduced to the 31% and 80% of the value before the outlier removal condition. The method suggested in this study can provide more appropriate regression curve because the excessive impacts of the outlier frequently included in the COD and TOC monitoring data is removed.

Regression and Correlation Analysis via Dynamic Graphs

  • Kang, Hee Mo;Sim, Songyong
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.3
    • /
    • pp.695-705
    • /
    • 2003
  • In this article, we propose a regression and correlation analysis via dynamic graphs and implement them in Java Web Start. For the polynomial relations between dependent and independent variables, dynamic graphics are implemented for both polynomial regression and spline estimates for an instant model selection. The results include basic statistics. They are available both as a web-based service and an application.