• Title/Summary/Keyword: Outlier

Search Result 652, Processing Time 0.034 seconds

Identification of Incorrect Data Labels Using Conditional Outlier Detection

  • Hong, Charmgil
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.8
    • /
    • pp.915-926
    • /
    • 2020
  • Outlier detection methods help one to identify unusual instances in data that may correspond to erroneous, exceptional, or surprising events or behaviors. This work studies conditional outlier detection, a special instance of the outlier detection problem, in the context of incorrect data label identification. Unlike conventional (unconditional) outlier detection methods that seek abnormalities across all data attributes, conditional outlier detection assumes data are given in pairs of input (condition) and output (response or label). Accordingly, the goal of conditional outlier detection is to identify incorrect or unusual output assignments considering their input as condition. As a solution to conditional outlier detection, this paper proposes the ratio-based outlier scoring (ROS) approach and its variant. The propose solutions work by adopting conventional outlier scores and are able to apply them to identify conditional outliers in data. Experiments on synthetic and real-world image datasets are conducted to demonstrate the benefits and advantages of the proposed approaches.

Improvement of PM Forecasting Performance by Outlier Data Removing (Outlier 데이터 제거를 통한 미세먼지 예보성능의 향상)

  • Jeon, Young Tae;Yu, Suk Hyun;Kwon, Hee Yong
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.6
    • /
    • pp.747-755
    • /
    • 2020
  • In this paper, we deal with outlier data problems that occur when constructing a PM2.5 fine dust forecasting system using a neural network. In general, when learning a neural network, some of the data are not helpful for learning, but rather disturbing. Those are called outlier data. When they are included in the training data, various problems such as overfitting occur. In building a PM2.5 fine dust concentration forecasting system using neural network, we have found several outlier data in the training data. We, therefore, remove them, and then make learning 3 ways. Over_outlier model removes outlier data that target concentration is low, but the model forecast is high. Under_outlier model removes outliers data that target concentration is high, but the model forecast is low. All_outlier model removes both Over_outlier and Under_outlier data. We compare 3 models with a conventional outlier removal model and non-removal model. Our outlier removal model shows better performance than the others.

First Order Difference-Based Error Variance Estimator in Nonparametric Regression with a Single Outlier

  • Park, Chun-Gun
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.3
    • /
    • pp.333-344
    • /
    • 2012
  • We consider some statistical properties of the first order difference-based error variance estimator in nonparametric regression models with a single outlier. So far under an outlier(s) such difference-based estimators has been rarely discussed. We propose the first order difference-based estimator using the leave-one-out method to detect a single outlier and simulate the outlier detection in a nonparametric regression model with the single outlier. Moreover, the outlier detection works well. The results are promising even in nonparametric regression models with many outliers using some difference based estimators.

Temporal and spatial outlier detection in wireless sensor networks

  • Nguyen, Hoc Thai;Thai, Nguyen Huu
    • ETRI Journal
    • /
    • v.41 no.4
    • /
    • pp.437-451
    • /
    • 2019
  • Outlier detection techniques play an important role in enhancing the reliability of data communication in wireless sensor networks (WSNs). Considering the importance of outlier detection in WSNs, many outlier detection techniques have been proposed. Unfortunately, most of these techniques still have some potential limitations, that is, (a) high rate of false positives, (b) high time complexity, and (c) failure to detect outliers online. Moreover, these approaches mainly focus on either temporal outliers or spatial outliers. Therefore, this paper aims to introduce novel algorithms that successfully detect both temporal outliers and spatial outliers. Our contributions are twofold: (i) modifying the Hampel Identifier (HI) algorithm to achieve high accuracy identification rate in temporal outlier detection, (ii) combining the Gaussian process (GP) model and graph-based outlier detection technique to improve the performance of the algorithm in spatial outlier detection. The results demonstrate that our techniques outperform the state-of-the-art methods in terms of accuracy and work well with various data types.

Assessing the Accuracy of Outlier Tests in Nonlinear Regression

  • Kahng, Myung-Wook;Kim, Bu-Yang
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.1
    • /
    • pp.163-168
    • /
    • 2009
  • Given the specific mean shift outlier model, the standard approaches to obtaining test statistics for outliers are discussed. Accuracy of outlier tests is investigated using subset curvatures. These subset curvatures appear to be reliable indicators of the adequacy of the linearization based test. Also, we consider obtaining graphical summaries of uncertainty in estimating parameters through confidence curves. The results are applied to the problem of assessing the accuracy of outlier tests.

Density-based Outlier Detection for Very Large Data (대용량 자료 분석을 위한 밀도기반 이상치 탐지)

  • Kim, Seung;Cho, Nam-Wook;Kang, Suk-Ho
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.35 no.2
    • /
    • pp.71-88
    • /
    • 2010
  • A density-based outlier detection such as an LOF (Local Outlier Factor) tries to find an outlying observation by using density of its surrounding space. In spite of several advantages of a density-based outlier detection method, the computational complexity of outlier detection has been one of major barriers in its application. In this paper, we present an LOF algorithm that can reduce computation time of a density based outlier detection algorithm. A kd-tree indexing and approximated k-nearest neighbor search algorithm (ANN) are adopted in the proposed method. A set of experiments was conducted to examine performance of the proposed algorithm. The results show that the proposed method can effectively detect local outliers in reduced computation time.

A Multiple Imputation for Reducing Outlier Effect (이상점 영향력 축소를 통한 무응답 대체법)

  • Kim, Man-Gyeom;Shin, Key-Il
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.7
    • /
    • pp.1229-1241
    • /
    • 2014
  • Most of sampling surveys have outliers and non-response missing values simultaneously. In that case, due to the effect of outliers, the result of imputation is not good enough to meet a given precision. To overcome this situation, outlier treatment should be conducted before imputation. In this paper in order for reducing the effect of outlier, we study outlier imputation methods and outlier weight adjustment methods. For the outlier detection, the method suggested by She and Owen (2011) is used. A small simulation study is conducted and for real data analysis, Monthly Labor Statistic and Briquette Consumption Survey Data are used.

The Filtering Method to Reduce Corner Outlier Artifacts in HEVC (Corner Outlier Artifacts를 감소시키기 위한 HEVC 필터링 방법)

  • Ko, Kyung-hwan
    • Journal of Broadcast Engineering
    • /
    • v.22 no.3
    • /
    • pp.313-320
    • /
    • 2017
  • The In-loop filtering methods such as de-blocking filter and SAO(Sample Adaptive Offset) applied to the HEVC standard achieves coding efficiency and subjective quality improvement by reducing the blocking artifacts and the ringing artifacts. However, despite the use of In-loop filtering methods, the artifacts called a corner outlier occurring at the corner points of block boundaries are not removed. In this paper, the corner outlier artifacts are reduced by the detection, determination, and filtering processes on the corner outlier pixels. Experimental results show that the proposed method improves the subjective picture quality and slightly increases the coding efficiency in Inter prediction.

A Binary Prediction Method for Outlier Detection using One-class SVM and Spectral Clustering in High Dimensional Data (고차원 데이터에서 One-class SVM과 Spectral Clustering을 이용한 이진 예측 이상치 탐지 방법)

  • Park, Cheong Hee
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.6
    • /
    • pp.886-893
    • /
    • 2022
  • Outlier detection refers to the task of detecting data that deviate significantly from the normal data distribution. Most outlier detection methods compute an outlier score which indicates the degree to which a data sample deviates from normal. However, setting a threshold for an outlier score to determine if a data sample is outlier or normal is not trivial. In this paper, we propose a binary prediction method for outlier detection based on spectral clustering and one-class SVM ensemble. Given training data consisting of normal data samples, a clustering method is performed to find clusters in the training data, and the ensemble of one-class SVM models trained on each cluster finds the boundaries of the normal data. We show how to obtain a threshold for transforming outlier scores computed from the ensemble of one-class SVM models into binary predictive values. Experimental results with high dimensional text data show that the proposed method can be effectively applied to high dimensional data, especially when the normal training data consists of different shapes and densities of clusters.

Efficient outlier removal algorithm for real-time panoramic stitching (실시간 파노라마 합성에서의 효과적인 outlier 제거 방법)

  • Kim, Beom Su;Cho, Nam Ik
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2011.07a
    • /
    • pp.513-516
    • /
    • 2011
  • 기존의 실시간 파노라마 합성 알고리즘에서는 매칭점과 입력 영상에서의 outlier를 구분하고 제거하기가 어렵기 때문에 노이즈가 많은 영상 또는 반복적인 패턴이 많은 영상에서 왜곡이 쉽게 발생하는 문제가 있다. 따라서 본 논문에서는 기존의 실시간 파노라마 합성 프레임웍에서 실시간 합성 조건을 만족시키면서 효과적으로 매칭점과 입력 영상에서의 outlier를 제거하는 방법을 제안한다. 이를 위해서 선형 모델에서 outlier을 제거하는 데 주로 사용되는 RANSAC 알고리즘을 실시간 파노라마 합성에서 사용되는 비선형 모델에 적용 가능하도록 수정하고 속도 향상을 위해서 사용되는 모델의 파라미터를 줄이는 방법을 제안한다. 이를 통하여 매칭점 중에 존재하는 outiler를 제거하고 전체 매칭점 중에서 inlier 비율을 이용하여 입력되는 영상시퀀스에서 outlier 영상을 제거하는 방법을 제안한다. 실험 결과 기존의 방법에 비해서 합성 결과의 왜곡이 줄어드는 것을 확인하였다.

  • PDF