• 제목/요약/키워드: variable selection

검색결과 874건 처리시간 0.032초

Bias Reduction in Split Variable Selection in C4.5

  • Shin, Sung-Chul;Jeong, Yeon-Joo;Song, Moon Sup
    • Communications for Statistical Applications and Methods
    • /
    • 제10권3호
    • /
    • pp.627-635
    • /
    • 2003
  • In this short communication we discuss the bias problem of C4.5 in split variable selection and suggest a method to reduce the variable selection bias among categorical predictor variables. A penalty proportional to the number of categories is applied to the splitting criterion gain of C4.5. The results of empirical comparisons show that the proposed modification of C4.5 reduces the size of classification trees.

Variable Selection Theorems in General Linear Model

  • 박정수;윤상후
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 한국데이터정보과학회 2006년도 PROCEEDINGS OF JOINT CONFERENCEOF KDISS AND KDAS
    • /
    • pp.171-179
    • /
    • 2006
  • For the problem of variable selection in linear models, we consider the errors are correlated with V covariance matrix. Hocking's theorems on the effects of the overfitting and the underfitting in linear model are extended to the less than full rank and correlated error model, and to the ANCOVA model.

  • PDF

Variable Selection Theorems in General Linear Model

  • Yoon, Sang-Hoo;Park, Jeong-Soo
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2005년도 추계 학술발표회 논문집
    • /
    • pp.187-192
    • /
    • 2005
  • For the problem of variable selection in linear models, we consider the errors are correlated with V covariance matrix. Hocking's theorems on the effects of the overfitting and the undefitting in linear model are extended to the less than full rank and correlated error model, and to the ANCOVA model

  • PDF

Bayesian Parameter :Estimation and Variable Selection in Random Effects Generalised Linear Models for Count Data

  • Oh, Man-Suk;Park, Tae-Sung
    • Journal of the Korean Statistical Society
    • /
    • 제31권1호
    • /
    • pp.93-107
    • /
    • 2002
  • Random effects generalised linear models are useful for analysing clustered count data in which responses are usually correlated. We propose a Bayesian approach to parameter estimation and variable selection in random effects generalised linear models for count data. A simple Gibbs sampling algorithm for parameter estimation is presented and a simple and efficient variable selection is done by using the Gibbs outputs. An illustrative example is provided.

Variable Selection with Nonconcave Penalty Function on Reduced-Rank Regression

  • Jung, Sang Yong;Park, Chongsun
    • Communications for Statistical Applications and Methods
    • /
    • 제22권1호
    • /
    • pp.41-54
    • /
    • 2015
  • In this article, we propose nonconcave penalties on a reduced-rank regression model to select variables and estimate coefficients simultaneously. We apply HARD (hard thresholding) and SCAD (smoothly clipped absolute deviation) symmetric penalty functions with singularities at the origin, and bounded by a constant to reduce bias. In our simulation study and real data analysis, the new method is compared with an existing variable selection method using $L_1$ penalty that exhibits competitive performance in prediction and variable selection. Instead of using only one type of penalty function, we use two or three penalty functions simultaneously and take advantages of various types of penalty functions together to select relevant predictors and estimation to improve the overall performance of model fitting.

Variable Selection with Regression Trees

  • Chang, Young-Jae
    • 응용통계연구
    • /
    • 제23권2호
    • /
    • pp.357-366
    • /
    • 2010
  • Many tree algorithms have been developed for regression problems. Although they are regarded as good algorithms, most of them suffer from loss of prediction accuracy when there are many noise variables. To handle this problem, we propose the multi-step GUIDE, which is a regression tree algorithm with a variable selection process. The multi-step GUIDE performs better than some of the well-known algorithms such as Random Forest and MARS. The results based on simulation study shows that the multi-step GUIDE outperforms other algorithms in terms of variable selection and prediction accuracy. It generally selects the important variables correctly with relatively few noise variables and eventually gives good prediction accuracy.

A convenient approach for penalty parameter selection in robust lasso regression

  • Kim, Jongyoung;Lee, Seokho
    • Communications for Statistical Applications and Methods
    • /
    • 제24권6호
    • /
    • pp.651-662
    • /
    • 2017
  • We propose an alternative procedure to select penalty parameter in $L_1$ penalized robust regression. This procedure is based on marginalization of prior distribution over the penalty parameter. Thus, resulting objective function does not include the penalty parameter due to marginalizing it out. In addition, its estimating algorithm automatically chooses a penalty parameter using the previous estimate of regression coefficients. The proposed approach bypasses cross validation as well as saves computing time. Variable-wise penalization also performs best in prediction and variable selection perspectives. Numerical studies using simulation data demonstrate the performance of our proposals. The proposed methods are applied to Boston housing data. Through simulation study and real data application we demonstrate that our proposals are competitive to or much better than cross-validation in prediction, variable selection, and computing time perspectives.

Efficient estimation and variable selection for partially linear single-index-coefficient regression models

  • Kim, Young-Ju
    • Communications for Statistical Applications and Methods
    • /
    • 제26권1호
    • /
    • pp.69-78
    • /
    • 2019
  • A structured model with both single-index and varying coefficients is a powerful tool in modeling high dimensional data. It has been widely used because the single-index can overcome the curse of dimensionality and varying coefficients can allow nonlinear interaction effects in the model. For high dimensional index vectors, variable selection becomes an important question in the model building process. In this paper, we propose an efficient estimation and a variable selection method based on a smoothing spline approach in a partially linear single-index-coefficient regression model. We also propose an efficient algorithm for simultaneously estimating the coefficient functions in a data-adaptive lower-dimensional approximation space and selecting significant variables in the index with the adaptive LASSO penalty. The empirical performance of the proposed method is illustrated with simulated and real data examples.

베이지안 변수선택 기법을 이용한 발틱건화물운임지수(BDI) 예측 (Forecasting the Baltic Dry Index Using Bayesian Variable Selection)

  • 한상우;김영민
    • 무역학회지
    • /
    • 제47권5호
    • /
    • pp.21-37
    • /
    • 2022
  • Baltic Dry Index (BDI) is difficult to forecast because of the high volatility and complexity. To improve the BDI forecasting ability, this study apply Bayesian variable selection method with a large number of predictors. Our estimation results based on the BDI and all predictors from January 2000 to September 2021 indicate that the out-of-sample prediction ability of the ADL model with the variable selection is superior to that of the AR model in terms of point and density forecasting. We also find that critical predictors for the BDI change over forecasts horizon. The lagged BDI are being selected as an key predictor at all forecasts horizon, but commodity price, the clarksea index, and interest rates have additional information to predict BDI at mid-term horizon. This implies that time variations of predictors should be considered to predict the BDI.

다중참조 및 가변블록 움직임 추정을 위한 고속 참조영상 선택 방법 (Fast Frame Selection Method for Multi-Reference and Variable Block Motion Estimation)

  • 김성대;선우명훈
    • 대한전자공학회논문지SP
    • /
    • 제45권6호
    • /
    • pp.1-8
    • /
    • 2008
  • 이 논문은 다중참조 및 가변블록 움직임 추정의 연산량을 효율적으로 줄이기 위해 세 가지 참조영상 선택 방법들을 소개한다. 제안된 RSP (Reference Selection Pass) 방법은 참조영상 선택의 부가적인 연산을 최소화 할 수 있고 MFS (Modified Frame Selection) 방법은 참조영상 선택 과정 중 영상의 움직임을 고려하여 참조영상 선택 시 연산 횟수를 기존 방식에 비해 17% 감소시킨다. 또한 TPRFS (Two Pass Reference frame Selection) 방법은 H.264/AVC에서 요구하는 가변블록 움직임 추정을 지원하기 위한 부가적인 연산을 블록 크기에 따라 선택되는 참조영상의 특성을 이용하여 최소화 한다. 실험 결과 제안한 방식은 기존의 방식에 비해 화질의 열화 없이 50% 이상의 움직임 추정의 연산량을 감소시킬 수 있다. 또한 제안한 참조영상 선택 방법은 움직임 추정의 주된 연산인 블록정합 단계와 별개로 수행이 되기 때문에 기존의 어떠한 단일참조 고속 움직임 탐색 방법과도 같이 사용되어 효율적으로 다중참조 및 가변블록 움직임 추정 연산을 지원 할 수 있다.