• Title, Summary, Keyword: outliers

Search Result 546, Processing Time 0.031 seconds

Process modeling using artificial neural network in the presence of outliers

  • 고영철;박화규;봉복준;손주찬;왕지남
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • /
    • pp.177-180
    • /
    • 1997
  • Outliers, unexpected extraordinary observations that look discordant from most observation in a data set are commonplace in various kinds of data analysis. Since the effect of outliers on model identification could be serious, the aim of this paper is to present some ways of handling outliers in given data set and to specify a model in the presence of outliers. A procedure based on neural network which identifies outliers, removes their effects, and specifies a model for the underlying process is proposed. In contrast with traditional parametric methods requiring to estimate the model's structure and parameters before detecting outliers, the proposed procedure is a nonparametric method without the estimation of model's structure and parameters before handling outliers and could be applied for real problems in the presence of outliers. The proposed methodology is performed as followings. Firstly, outliers are detected and the detected outliers replace the prediction values using outliers detection neural network. The data set removing the effect of outliers is retraining using neural network. Therefore the effects of outliers are removed and the modeling precision can be improved. Experimental results show that the proposed method is suitable for predicting data set in the presence of outliers.

  • PDF

The Identification Of Multiple Outliers

  • Park, Jin-Pyo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.11 no.2
    • /
    • pp.201-215
    • /
    • 2000
  • The classical method for regression analysis is the least squares method. However, if the data contain significant outliers, the least squares estimator can be broken down by outliers. To remedy this problem, the robust methods are important complement to the least squares method. Robust methods down weighs or completely ignore the outliers. This is not always best because the outliers can contain some very important information about the population. If they can be detected, the outliers can be further inspected and appropriate action can be taken based on the results. In this paper, I propose a sequential outlier test to identify outliers. It is based on the nonrobust estimate and the robust estimate of scatter of a robust regression residuals and is applied in forward procedure, removing the most extreme data at each step, until the test fails to detect outliers. Unlike other forward procedures, the present one is unaffected by swamping or masking effects because the statistics is based on the robust regression residuals. I show the asymptotic distribution of the test statistics and apply the test to several real data and simulated data for the test to be shown to perform fairly well.

  • PDF

Simultaneous Identification of Multiple Outliers and High Leverage Points in Linear Regression

  • Rahmatullah Imon, A.H.M.;Ali, M. Masoom
    • Journal of the Korean Data and Information Science Society
    • /
    • v.16 no.2
    • /
    • pp.429-444
    • /
    • 2005
  • The identification of unusual observations such as outliers and high leverage points has drawn a great deal of attention for many years. Most of these identifications techniques are based on case deletion that focuses more on the outliers than the high leverage points. But residuals together with leverage values may cause masking and swamping for which a good number of unusual observations remain undetected in the presence of multiple outliers and multiple high leverage points. In this paper we propose a new procedure to identify outliers and high leverage points simultaneously. We suggest an additive form of the residuals and the leverages that gives almost an equal focus on outliers and leverages. We analyzed several well-referred data set and discover few outliers and high leverage points that were undetected by the existing diagnostic techniques.

  • PDF

Weight Reduction Method for Outlier in Survey Sampling

  • Kim Jin
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.1
    • /
    • pp.19-27
    • /
    • 2006
  • Outliers in survey are a perennial problem for applied survey statisticians to estimate the total or mean of population. The influence of outliers is more increasing as they have large weights in survey sampling. Many techniques have been studied to lower the impact of outliers on sample survey estimates. Outliers can be downweighted by winsorization or reducing the weight of outliers. The weight reduction is more reasonable than replacing one outlier by one value of non-outliers, because it has at least one unit. In this paper, we suggest the square root transformation of weight as the weight reduction method. We show this method is efficient with real data, and it's also easy to apply in practical affairs.

Outlier tests on potential outliers (잠재적 이상치군에 대한 검정)

  • Seo, Han Son
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.1
    • /
    • pp.159-167
    • /
    • 2017
  • Observations identified as potential outliers are usually tested for real outliers; however, some outlier detection methods skip a formal test or perform a test using simulated p-values. We introduce test procedures for outliers by testing subsets of potential outliers rather than by testing individual observations of potential outliers to avoid masking or swamping effects. Examples to illustrate methods and a Monte Carlo study to compare the power of the various methods are presented.

Joint Estimation of the Outliers Effect and the Model Parameters in ARMA Process

  • Lee, Kwang-Ho;Shin, Hye-Jung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.6 no.2
    • /
    • pp.41-50
    • /
    • 1995
  • In this paper, an iterative procedure, which detects the location of the outliers and the joint estimates of the outliers effects and the model parameters in the autoregressive moving average model with two types of outliers, is proposed. The performance of the procedure is compared with the one in Chen and Liu(1993) through the Monte Carlo simulation. The proposed procedure is very robust in the sense that applies the procedures to the stationary time series model with any types of outliers.

  • PDF

The Sequential Testing of Multiple Outliers in Linear Regression

  • Park, Jinpyo;Park, Heechang
    • Communications for Statistical Applications and Methods
    • /
    • v.8 no.2
    • /
    • pp.337-346
    • /
    • 2001
  • In this paper we consider the problem of identifying and testing the outliers in linear regression. first we consider the problem for testing the null hypothesis of no outliers. The test based on the ratio of two scale estimates is proposed. We show the asymptotic distribution of the test statistic by Monte Carlo simulation and investigate its properties. Next we consider the problem of identifying the outliers. A forward sequential procedure based on the suggested test is proposed and shown to perform fairly well. The forward sequential procedure is unaffected by masking and swamping effects because the test statistic is based on robust estimate.

  • PDF

Improving the Quality of Response Surface Analysis of an Experiment for Coffee-Supplemented Milk Beverage: I. Data Screening at the Center Point and Maximum Possible R-Square

  • Rheem, Sungsue;Oh, Sejong
    • Food Science of Animal Resources
    • /
    • v.39 no.1
    • /
    • pp.114-120
    • /
    • 2019
  • Response surface methodology (RSM) is a useful set of statistical techniques for modeling and optimizing responses in research studies of food science. As a design for a response surface experiment, a central composite design (CCD) with multiple runs at the center point is frequently used. However, sometimes there exist situations where some among the responses at the center point are outliers and these outliers are overlooked. Since the responses from center runs are those from the same experimental conditions, there should be no outliers at the center point. Outliers at the center point ruin statistical analysis. Thus, the responses at the center point need to be looked at, and if outliers are observed, they have to be examined. If the reasons for the outliers are not errors in measuring or typing, such outliers need to be deleted. If the outliers are due to such errors, they have to be corrected. Through a re-analysis of a dataset published in the Korean Journal for Food Science of Animal Resources, we have shown that outlier elimination resulted in the increase of the maximum possible R-square that the modeling of the data can obtain, which enables us to improve the quality of response surface analysis.

On Rice Estimator in Simple Regression Models with Outliers (이상치가 존재하는 단순회귀모형에서 Rice 추정량에 관해서)

  • Park, Chun Gun
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.3
    • /
    • pp.511-520
    • /
    • 2013
  • Detection outliers and robust estimators are crucial in regression models with outliers. In such studies the focus is on detecting outliers and estimating the coefficients using leave-one-out. Our study introduces Rice estimator which is an error variance estimator without estimating the coefficients. In particular, we study a comparison of the statistical properties for Rice estimator with and without outliers in simple regression models.

The Forward Sequential Procedure for the Identifying Multiple Outliers in Linear Regression

  • Park, Jin-Pyo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.16 no.4
    • /
    • pp.1053-1066
    • /
    • 2005
  • In this paper we consider the problem of identifying and testing outliers in linear regression. First we consider the use of the so-called scale ratio tests for testing the null hypothesis of no outliers. This test is based on the ratio of two residual scale estimates. We show the asymptotic distribution of the test statistics and investigate its properties. Next we consider the problem of identifying the outliers. A forward sequential procedure using the suggested test is proposed. The new method is compared with classical procedure in the real data example. Unlike other forward procedures, the present one is unaffected by masking and swamping effects because the test statistic is based on robust scale estimate.

  • PDF