• Title, Summary, Keyword: variable selection

### Unified methods for variable selection and outlier detection in a linear regression

• Seo, Han Son
• Communications for Statistical Applications and Methods
• /
• v.26 no.6
• /
• pp.575-582
• /
• 2019
• The problem of selecting variables in the presence of outliers is considered. Variable selection and outlier detection are not separable problems because each observation affects the fitted regression equation differently and has a different influence on each variable. We suggest a simultaneous method for variable selection and outlier detection in a linear regression model. The suggested procedure uses a sequential method to detect outliers and uses all possible subset regressions for model selections. A simplified version of the procedure is also proposed to reduce the computational burden. The procedures are compared to other variable selection methods using real data sets known to contain outliers. Examples show that the proposed procedures are effective and superior to robust algorithms in selecting the best model.

### Regression Trees with. Unbiased Variable Selection (변수선택 편향이 없는 회귀나무를 만들기 위한 알고리즘)

• 김진흠;김민호
• The Korean Journal of Applied Statistics
• /
• v.17 no.3
• /
• pp.459-473
• /
• 2004
• It has well known that an exhaustive search algorithm suggested by Breiman et. a1.(1984) has a trend to select the variable having relatively many possible splits as an splitting rule. We propose an algorithm to overcome this variable selection bias problem and then construct unbiased regression trees based on the algorithm. The proposed algorithm runs two steps of selecting a split variable and determining a split rule for binary split based on the split variable. Simulation studies were performed to compare the proposed algorithm with Breiman et a1.(1984)'s CART(Classification and Regression Tree) in terms of degree of variable selection bias, variable selection power, and MSE(Mean Squared Error). Also, we illustrate the proposed algorithm with real data sets.

### Variable selection in L1 penalized censored regression

• Hwang, Chang-Ha;Kim, Mal-Suk;Shi, Joo-Yong
• Journal of the Korean Data and Information Science Society
• /
• v.22 no.5
• /
• pp.951-959
• /
• 2011
• The proposed method is based on a penalized censored regression model with L1-penalty. We use the iteratively reweighted least squares procedure to solve L1 penalized log likelihood function of censored regression model. It provide the efficient computation of regression parameters including variable selection and leads to the generalized cross validation function for the model selection. Numerical results are then presented to indicate the performance of the proposed method.

### Estimation and variable selection in censored regression model with smoothly clipped absolute deviation penalty

• Shim, Jooyong;Bae, Jongsig;Seok, Kyungha
• Journal of the Korean Data and Information Science Society
• /
• v.27 no.6
• /
• pp.1653-1660
• /
• 2016
• Smoothly clipped absolute deviation (SCAD) penalty is known to satisfy the desirable properties for penalty functions like as unbiasedness, sparsity and continuity. In this paper, we deal with the regression function estimation and variable selection based on SCAD penalized censored regression model. We use the local linear approximation and the iteratively reweighted least squares algorithm to solve SCAD penalized log likelihood function. The proposed method provides an efficient method for variable selection and regression function estimation. The generalized cross validation function is presented for the model selection. Applications of the proposed method are illustrated through the simulated and a real example.

### Classification of High Dimensionality Data through Feature Selection Using Markov Blanket

• Lee, Junghye;Jun, Chi-Hyuck
• Industrial Engineering and Management Systems
• /
• v.14 no.2
• /
• pp.210-219
• /
• 2015
• A classification task requires an exponentially growing amount of computation time and number of observations as the variable dimensionality increases. Thus, reducing the dimensionality of the data is essential when the number of observations is limited. Often, dimensionality reduction or feature selection leads to better classification performance than using the whole number of features. In this paper, we study the possibility of utilizing the Markov blanket discovery algorithm as a new feature selection method. The Markov blanket of a target variable is the minimal variable set for explaining the target variable on the basis of conditional independence of all the variables to be connected in a Bayesian network. We apply several Markov blanket discovery algorithms to some high-dimensional categorical and continuous data sets, and compare their classification performance with other feature selection methods using well-known classifiers.

### A Unit Selection Methods using Variable Break in a Japanese TTS (일본어 TTS의 가변 Break를 이용한 합성단위 선택 방법)

• Na, Deok-Su;Bae, Myung-Jin
• Proceedings of the IEEK Conference
• /
• /
• pp.983-984
• /
• 2008
• This paper proposes a variable break that can offset prediction error as well as a pre-selection methods, based on the variable break, for enhanced unit selection. In Japanese, a sentence consists of several APs (Accentual phrases) and MPs (Major phrases), and the breaks between these phrases must predicted to realize text-to-speech systems. An MP also consists of several APs and plays a decisive role in making synthetic speech natural and understandable because short pauses appear at its boundary. The variable break is defined as a break that is able to change easily from an AP to an MP boundary, or from an MP to an AP boundary. Using CART (Classification and Regression Trees), the variable break is modeled stochastically, and then we pre-select candidate units in the unit-selection process. As the experimental results show, it was possible to complement a break prediction error and improve the naturalness of synthetic speech.

### Variable Selection Theorems in General Linear Model

• Yoon, Sang-Hoo;Park, Jeong-Soo
• Proceedings of the Korean Statistical Society Conference
• /
• /
• pp.187-192
• /
• 2005
• For the problem of variable selection in linear models, we consider the errors are correlated with V covariance matrix. Hocking's theorems on the effects of the overfitting and the undefitting in linear model are extended to the less than full rank and correlated error model, and to the ANCOVA model

### Variable Selection Theorems in General Linear Model

• Park, Jeong-Soo;Yoon, Sang-Hoo
• 한국데이터정보과학회:학술대회논문집
• /
• /
• pp.171-179
• /
• 2006
• For the problem of variable selection in linear models, we consider the errors are correlated with V covariance matrix. Hocking's theorems on the effects of the overfitting and the underfitting in linear model are extended to the less than full rank and correlated error model, and to the ANCOVA model.

### Bayesian Parameter :Estimation and Variable Selection in Random Effects Generalised Linear Models for Count Data

• Oh, Man-Suk;Park, Tae-Sung
• Journal of the Korean Statistical Society
• /
• v.31 no.1
• /
• pp.93-107
• /
• 2002
• Random effects generalised linear models are useful for analysing clustered count data in which responses are usually correlated. We propose a Bayesian approach to parameter estimation and variable selection in random effects generalised linear models for count data. A simple Gibbs sampling algorithm for parameter estimation is presented and a simple and efficient variable selection is done by using the Gibbs outputs. An illustrative example is provided.

### Bias Reduction in Split Variable Selection in C4.5

• Shin, Sung-Chul;Jeong, Yeon-Joo;Song, Moon Sup
• Communications for Statistical Applications and Methods
• /
• v.10 no.3
• /
• pp.627-635
• /
• 2003
• In this short communication we discuss the bias problem of C4.5 in split variable selection and suggest a method to reduce the variable selection bias among categorical predictor variables. A penalty proportional to the number of categories is applied to the splitting criterion gain of C4.5. The results of empirical comparisons show that the proposed modification of C4.5 reduces the size of classification trees.