Evaluating Variable Selection Techniques for Multivariate Linear Regression

Ryu, Nahyeon;Kim, Hyungseok;Kang, Pilsung;

doi:10.7232/JKIIE.2016.42.5.314

대한산업공학회지 (Journal of Korean Institute of Industrial Engineers)

제42권5호
/
Pages.314-326
/
2016
/
1225-0988(pISSN)
/
2234-6457(eISSN)

대한산업공학회 (Korean Institute of Industrial Engineers)

DOI QR Code

다중선형회귀모형에서의 변수선택기법 평가

Evaluating Variable Selection Techniques for Multivariate Linear Regression

류나현 (고려대학교 산업경영공학부) ;
김형석 (고려대학교 산업경영공학부) ;
강필성 (고려대학교 산업경영공학부)

Ryu, Nahyeon (School of Industrial Management Engineering, Korea University) ;
Kim, Hyungseok (School of Industrial Management Engineering, Korea University) ;
Kang, Pilsung (School of Industrial Management Engineering, Korea University)

투고 : 2016.06.10
심사 : 2016.10.04
발행 : 2016.10.15

https://doi.org/10.7232/JKIIE.2016.42.5.314 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

The purpose of variable selection techniques is to select a subset of relevant variables for a particular learning algorithm in order to improve the accuracy of prediction model and improve the efficiency of the model. We conduct an empirical analysis to evaluate and compare seven well-known variable selection techniques for multiple linear regression model, which is one of the most commonly used regression model in practice. The variable selection techniques we apply are forward selection, backward elimination, stepwise selection, genetic algorithm (GA), ridge regression, lasso (Least Absolute Shrinkage and Selection Operator) and elastic net. Based on the experiment with 49 regression data sets, it is found that GA resulted in the lowest error rates while lasso most significantly reduces the number of variables. In terms of computational efficiency, forward/backward elimination and lasso requires less time than the other techniques.

키워드

참고문헌

Bellman, R. E. (2015), Adaptive Control Processes : A Guided Tour, Princeton university press.
Blum, A. L. and Langley, P. (1997), Selection of relevant features and examples in machine learning, Artificial Intelligence, 97(1), 245-271. https://doi.org/10.1016/S0004-3702(97)00063-5
Chatterjee, S. and Hadi, A. S. (2015), Regression Analysis by Example, John Wiley and Sons.
Fernández-Delgado, M., Cernadas, E., Barro, S., and Amorim, D. (2014), Do we need hundreds of classifiers to solve real world classification problems, J. Mach. Learn. Res, 15(1), 3133-3181.
Guyon, I. and Elisseeff, A. (2003), An introduction to variable and feature selection, The Journal of Machine Learning Research, 3, 1157-1182.
Hoerl, A. E. and Kennard, R. W. (1970), Ridge regression : Biased estimation for non orthogonal problems, Technometrics, 12(1), 55-67. https://doi.org/10.1080/00401706.1970.10488634
James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013), An Introduction to Statistical Learning, New York : springer, 112.
Kang, P., Lee, H., Cho, S., Kim, D., Park, J., and Park, C.-K. (2009), A virtual metrology system for semiconductor manufacturing, Expert Systems with Applications, 36(11), 12554-12561. https://doi.org/10.1016/j.eswa.2009.05.053
Kang, P., Kim, D., Lee, H., Doh, S., and Cho, S. (2011), Virtual metrology for run-to-run control in semiconductor manufacturing, Expert Systems with Applications, 38(3), 2508-2522. https://doi.org/10.1016/j.eswa.2010.08.040
Kim, D., Kang, P., Lee, S.-K., Kang, S., Doh, S., and Cho, S. (2015), Improvement of virtual metrology performance by removing metrology noises in a training dataset, Pattern Analysis and Applications, 18(1), 173-189. https://doi.org/10.1007/s10044-013-0363-5
Kohavi, R. and John, G. H. (1997), Wrappers for feature subset selection, Artificial intelligence, 97(1), 273-324. https://doi.org/10.1016/S0004-3702(97)00043-X
Lastovicka, J. L. and Sirianni, N. J. (2011), Truly, madly, deeply : Consumers in the throes of material possession love, Journal of Consumer Research, 38(2), 323-342. https://doi.org/10.1086/658338
Lee, H., Kim, S. G., Park, H.-W., and Kang, P. (2014), Pre-launch new product demand forecasting using the Bass model : A statistical and machine learning-based approach, Technological Forecasting and Social Change, 86, 49-64. https://doi.org/10.1016/j.techfore.2013.08.020
Madhuri, V. H. and Rani, T. S. (2015), Ranking and dimensionality reduction using biclustering, In Proceedings of the Fifth International Conference on Fuzzy and Neuro Computing (FANCCO), 209-226.
Mallick, H. and Yi, N. (2013), Bayesian methods for high dimensional linear models, Journal of Biometrics and Biostatistics, 1(5).
Ross, S. M. (2004), Introduction to Probability and Statistic for Engineers and Scientists, Academic Press.
Shumway, R. H. and Stoffer, D. S. (2010), Time series analysis and its applications : with R examples, Springer Science and Business Media.
Smialowski, P., Frishman, D., and Kramer, S. (2010), Pitfalls of supervised feature selection, Bioinformatics, 26(3), 440-443. https://doi.org/10.1093/bioinformatics/btp621
Tibshirani, R. (1996), Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society, Series B (Methodological), 267-288.
Yang, J. and Honavar, V. (1998), Feature subset selection using a genetic algorithm, IEEE Intelligent Systems and Their Applications, 13(2), 44-49. https://doi.org/10.1109/5254.671091

대한산업공학회지 (Journal of Korean Institute of Industrial Engineers)

다중선형회귀모형에서의 변수선택기법 평가

Evaluating Variable Selection Techniques for Multivariate Linear Regression

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)