• Title/Summary/Keyword: variable selection

Search Result 873, Processing Time 0.027 seconds

Bias Reduction in Split Variable Selection in C4.5

  • Shin, Sung-Chul;Jeong, Yeon-Joo;Song, Moon Sup
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.3
    • /
    • pp.627-635
    • /
    • 2003
  • In this short communication we discuss the bias problem of C4.5 in split variable selection and suggest a method to reduce the variable selection bias among categorical predictor variables. A penalty proportional to the number of categories is applied to the splitting criterion gain of C4.5. The results of empirical comparisons show that the proposed modification of C4.5 reduces the size of classification trees.

Variable Selection Theorems in General Linear Model

  • Park, Jeong-Soo;Yoon, Sang-Hoo
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2006.04a
    • /
    • pp.171-179
    • /
    • 2006
  • For the problem of variable selection in linear models, we consider the errors are correlated with V covariance matrix. Hocking's theorems on the effects of the overfitting and the underfitting in linear model are extended to the less than full rank and correlated error model, and to the ANCOVA model.

  • PDF

Variable Selection Theorems in General Linear Model

  • Yoon, Sang-Hoo;Park, Jeong-Soo
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2005.11a
    • /
    • pp.187-192
    • /
    • 2005
  • For the problem of variable selection in linear models, we consider the errors are correlated with V covariance matrix. Hocking's theorems on the effects of the overfitting and the undefitting in linear model are extended to the less than full rank and correlated error model, and to the ANCOVA model

  • PDF

Bayesian Parameter :Estimation and Variable Selection in Random Effects Generalised Linear Models for Count Data

  • Oh, Man-Suk;Park, Tae-Sung
    • Journal of the Korean Statistical Society
    • /
    • v.31 no.1
    • /
    • pp.93-107
    • /
    • 2002
  • Random effects generalised linear models are useful for analysing clustered count data in which responses are usually correlated. We propose a Bayesian approach to parameter estimation and variable selection in random effects generalised linear models for count data. A simple Gibbs sampling algorithm for parameter estimation is presented and a simple and efficient variable selection is done by using the Gibbs outputs. An illustrative example is provided.

Variable Selection with Nonconcave Penalty Function on Reduced-Rank Regression

  • Jung, Sang Yong;Park, Chongsun
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.1
    • /
    • pp.41-54
    • /
    • 2015
  • In this article, we propose nonconcave penalties on a reduced-rank regression model to select variables and estimate coefficients simultaneously. We apply HARD (hard thresholding) and SCAD (smoothly clipped absolute deviation) symmetric penalty functions with singularities at the origin, and bounded by a constant to reduce bias. In our simulation study and real data analysis, the new method is compared with an existing variable selection method using $L_1$ penalty that exhibits competitive performance in prediction and variable selection. Instead of using only one type of penalty function, we use two or three penalty functions simultaneously and take advantages of various types of penalty functions together to select relevant predictors and estimation to improve the overall performance of model fitting.

Variable Selection with Regression Trees

  • Chang, Young-Jae
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.2
    • /
    • pp.357-366
    • /
    • 2010
  • Many tree algorithms have been developed for regression problems. Although they are regarded as good algorithms, most of them suffer from loss of prediction accuracy when there are many noise variables. To handle this problem, we propose the multi-step GUIDE, which is a regression tree algorithm with a variable selection process. The multi-step GUIDE performs better than some of the well-known algorithms such as Random Forest and MARS. The results based on simulation study shows that the multi-step GUIDE outperforms other algorithms in terms of variable selection and prediction accuracy. It generally selects the important variables correctly with relatively few noise variables and eventually gives good prediction accuracy.

A convenient approach for penalty parameter selection in robust lasso regression

  • Kim, Jongyoung;Lee, Seokho
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.6
    • /
    • pp.651-662
    • /
    • 2017
  • We propose an alternative procedure to select penalty parameter in $L_1$ penalized robust regression. This procedure is based on marginalization of prior distribution over the penalty parameter. Thus, resulting objective function does not include the penalty parameter due to marginalizing it out. In addition, its estimating algorithm automatically chooses a penalty parameter using the previous estimate of regression coefficients. The proposed approach bypasses cross validation as well as saves computing time. Variable-wise penalization also performs best in prediction and variable selection perspectives. Numerical studies using simulation data demonstrate the performance of our proposals. The proposed methods are applied to Boston housing data. Through simulation study and real data application we demonstrate that our proposals are competitive to or much better than cross-validation in prediction, variable selection, and computing time perspectives.

Efficient estimation and variable selection for partially linear single-index-coefficient regression models

  • Kim, Young-Ju
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.1
    • /
    • pp.69-78
    • /
    • 2019
  • A structured model with both single-index and varying coefficients is a powerful tool in modeling high dimensional data. It has been widely used because the single-index can overcome the curse of dimensionality and varying coefficients can allow nonlinear interaction effects in the model. For high dimensional index vectors, variable selection becomes an important question in the model building process. In this paper, we propose an efficient estimation and a variable selection method based on a smoothing spline approach in a partially linear single-index-coefficient regression model. We also propose an efficient algorithm for simultaneously estimating the coefficient functions in a data-adaptive lower-dimensional approximation space and selecting significant variables in the index with the adaptive LASSO penalty. The empirical performance of the proposed method is illustrated with simulated and real data examples.

Forecasting the Baltic Dry Index Using Bayesian Variable Selection (베이지안 변수선택 기법을 이용한 발틱건화물운임지수(BDI) 예측)

  • Xiang-Yu Han;Young Min Kim
    • Korea Trade Review
    • /
    • v.47 no.5
    • /
    • pp.21-37
    • /
    • 2022
  • Baltic Dry Index (BDI) is difficult to forecast because of the high volatility and complexity. To improve the BDI forecasting ability, this study apply Bayesian variable selection method with a large number of predictors. Our estimation results based on the BDI and all predictors from January 2000 to September 2021 indicate that the out-of-sample prediction ability of the ADL model with the variable selection is superior to that of the AR model in terms of point and density forecasting. We also find that critical predictors for the BDI change over forecasts horizon. The lagged BDI are being selected as an key predictor at all forecasts horizon, but commodity price, the clarksea index, and interest rates have additional information to predict BDI at mid-term horizon. This implies that time variations of predictors should be considered to predict the BDI.

Fast Frame Selection Method for Multi-Reference and Variable Block Motion Estimation (다중참조 및 가변블록 움직임 추정을 위한 고속 참조영상 선택 방법)

  • Kim, Sung-Dae;SunWoo, Myung-Hoon
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.45 no.6
    • /
    • pp.1-8
    • /
    • 2008
  • This paper introduces three efficient frame selection schemes to reduce the computation complexity for the multi-reference and variable block size Motion Estimation (ME). The proposed RSP (Reference Selection Pass) scheme can minimize the overhead of frame selection. The MFS (Modified Frame Selection) scheme can reduce the number of search points about 18% compared with existing schemes considering the motion of image during the reference frame selection process. In addition, the TPRFS (Two Pass Reference frame Selection) scheme can minimize the frame selection operation for the variable block size ME in H.264/AVC using the character of selected reference frame according to the block size. The simulation results show the proposed schemes can save up to 50% of the ME computation without degradation of image Qualify. Because the proposed schemes can be separated from the block matching process, they can be used with any existing single reference fast search algorithms.