• Title/Summary/Keyword: Biased cross-validation

Search Result 6, Processing Time 0.024 seconds

Bandwidth selections based on cross-validation for estimation of a discontinuity point in density (교차타당성을 이용한 확률밀도함수의 불연속점 추정의 띠폭 선택)

  • Huh, Jib
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.4
    • /
    • pp.765-775
    • /
    • 2012
  • The cross-validation is a popular method to select bandwidth in all types of kernel estimation. The maximum likelihood cross-validation, the least squares cross-validation and biased cross-validation have been proposed for bandwidth selection in kernel density estimation. In the case that the probability density function has a discontinuity point, Huh (2012) proposed a method of bandwidth selection using the maximum likelihood cross-validation. In this paper, two forms of cross-validation with the one-sided kernel function are proposed for bandwidth selection to estimate the location and jump size of the discontinuity point of density. These methods are motivated by the least squares cross-validation and the biased cross-validation. By simulated examples, the finite sample performances of two proposed methods with the one of Huh (2012) are compared.

Development of Machine Learning Ensemble Model using Artificial Intelligence (인공지능을 활용한 기계학습 앙상블 모델 개발)

  • Lee, K.W.;Won, Y.J.;Song, Y.B.;Cho, K.S.
    • Journal of the Korean Society for Heat Treatment
    • /
    • v.34 no.5
    • /
    • pp.211-217
    • /
    • 2021
  • To predict mechanical properties of secondary hardening martensitic steels, a machine learning ensemble model was established. Based on ANN(Artificial Neural Network) architecture, some kinds of methods was considered to optimize the model. In particular, interaction features, which can reflect interactions between chemical compositions and processing conditions of real alloy system, was considered by means of feature engineering, and then K-Fold cross validation coupled with bagging ensemble were investigated to reduce R2_score and a factor indicating average learning errors owing to biased experimental database.

AN EFFECTIVE BANDWIDTDTH SELECTOR IN A COMPLICATED KERNEL REGRESSION

  • Oh, Jong-Chul
    • Journal of applied mathematics & informatics
    • /
    • v.3 no.2
    • /
    • pp.205-216
    • /
    • 1996
  • The field of nonparametrics has shown its appeal in re-cent years with anarray of new tools for statistical analysis. As one of those tools nonparametric regression has become a prominent statis-tical research topic and also has been well established as a useful tool. In this article we investigate the biased cross-validation selector, BCV, which is proposed by Oh et al. (1995) for a less smoothing regression function. In the simulation study BCV selector is shown to perform well in parctice with respect to ASE ratio.

Deep Learning Model Validation Method Based on Image Data Feature Coverage (영상 데이터 특징 커버리지 기반 딥러닝 모델 검증 기법)

  • Lim, Chang-Nam;Park, Ye-Seul;Lee, Jung-Won
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.9
    • /
    • pp.375-384
    • /
    • 2021
  • Deep learning techniques have been proven to have high performance in image processing and are applied in various fields. The most widely used methods for validating a deep learning model include a holdout verification method, a k-fold cross verification method, and a bootstrap method. These legacy methods consider the balance of the ratio between classes in the process of dividing the data set, but do not consider the ratio of various features that exist within the same class. If these features are not considered, verification results may be biased toward some features. Therefore, we propose a deep learning model validation method based on data feature coverage for image classification by improving the legacy methods. The proposed technique proposes a data feature coverage that can be measured numerically how much the training data set for training and validation of the deep learning model and the evaluation data set reflects the features of the entire data set. In this method, the data set can be divided by ensuring coverage to include all features of the entire data set, and the evaluation result of the model can be analyzed in units of feature clusters. As a result, by providing feature cluster information for the evaluation result of the trained model, feature information of data that affects the trained model can be provided.

Pairwise Neural Networks for Predicting Compound-Protein Interaction (약물-표적 단백질 연관관계 예측모델을 위한 쌍 기반 뉴럴네트워크)

  • Lee, Munhwan;Kim, Eunghee;Kim, Hong-Gee
    • Korean Journal of Cognitive Science
    • /
    • v.28 no.4
    • /
    • pp.299-314
    • /
    • 2017
  • Predicting compound-protein interactions in-silico is significant for the drug discovery. In this paper, we propose an scalable machine learning model to predict compound-protein interaction. The key idea of this scalable machine learning model is the architecture of pairwise neural network model and feature embedding method from the raw data, especially for protein. This method automatically extracts the features without additional knowledge of compound and protein. Also, the pairwise architecture elevate the expressiveness and compact dimension of feature by preventing biased learning from occurring due to the dimension and type of features. Through the 5-fold cross validation results on large scale database show that pairwise neural network improves the performance of predicting compound-protein interaction compared to previous prediction models.

Comparison of Univariate Kriging Algorithms for GIS-based Thematic Mapping with Ground Survey Data (현장 조사 자료를 이용한 GIS 기반 주제도 작성을 위한 단변량 크리깅 기법의 비교)

  • Park, No-Wook
    • Korean Journal of Remote Sensing
    • /
    • v.25 no.4
    • /
    • pp.321-338
    • /
    • 2009
  • The objective of this paper is to compare spatial prediction capabilities of univariate kriging algorithms for generating GIS-based thematic maps from ground survey data with asymmetric distributions. Four univariate kriging algorithms including traditional ordinary kriging, three non-linear transform-based kriging algorithms such as log-normal kriging, multi-Gaussian kriging and indicator kriging are applied for spatial interpolation of geochemical As and Pb elements. Cross validation based on a leave-one-out approach is applied and then prediction errors are computed. The impact of the sampling density of the ground survey data on the prediction errors are also investigated. Through the case study, indicator kriging showed the smallest prediction errors and superior prediction capabilities of very low and very high values. Other non-linear transform based kriging algorithms yielded better prediction capabilities than traditional ordinary kriging. Log-normal kriging which has been widely applied, however, produced biased estimation results (overall, overestimation). It is expected that such quantitative comparison results would be effectively used for the selection of an optimal kriging algorithm for spatial interpolation of ground survey data with asymmetric distributions.