DOI QR코드

DOI QR Code

Panel data analysis with regression trees

회귀나무 모형을 이용한 패널데이터 분석

  • Chang, Youngjae (Department of Information Statistics, Korea National Open University)
  • 장영재 (한국방송통신대학교 정보통계학과)
  • Received : 2014.08.10
  • Accepted : 2014.09.18
  • Published : 2014.11.30

Abstract

Regression tree is a tree-structured solution in which a simple regression model is fitted to the data in each node made by recursive partitioning of predictor space. There have been many efforts to apply tree algorithms to various regression problems like logistic regression and quantile regression. Recently, algorithms have been expanded to the panel data analysis such as RE-EM algorithm by Sela and Simonoff (2012), and extension of GUIDE by Loh and Zheng (2013). The algorithms are briefly introduced and prediction accuracy of three methods are compared in this paper. In general, RE-EM shows good prediction accuracy with least MSE's in the simulation study. A RE-EM tree fitted to business survey index (BSI) panel data shows that sales BSI is the main factor which affects business entrepreneurs' economic sentiment. The economic sentiment BSI of non-manufacturing industries is higher than that of manufacturing ones among the relatively high sales group.

회귀나무 (regression tree)는 독립변수로 이루어진 공간을 재귀적으로 분할하고 해당 영역에서 종속변수의 최선의 예측값을 찾고자 하는 비모수적 방법론이다. 회귀나무 모형이 제안된 이래 로지스틱 회귀나무모형이나 분위수 회귀나무모형과 같이 유연하고 다양한 모형적합을 위한 연구가 진행되어 왔다. 최근에 들어서는 Sela와 Simonoff (2012)의 RE-EM 알고리즘, Loh와 Zheng (2013)의 GUIDE 등 패널데이터와 관련하여 진일보한 나무모형 알고리즘도 제안되었다. 본 논문에서는 각 알고리즘을 소개하고 특징을 살펴보는 한편, 실험 데이터를 생성하여 평균제곱오차 (mean squared error)를 바탕으로 예측력을 비교하였다. 분석결과, RE-EM 알고리즘의 예측력이 상대적으로 우수하게 나타났다. 이 알고리즘을 통해 기업경기실사지수 업종별 패널자료를 분석한 결과 최근의 업황에 가장 큰 영향을 미치는 요소는 매출 실적으로 나타났으며 매출 상위 그룹의 경우 비제조업이 제조업에 비해 업황에 대한 판단이 긍정적인 것으로 나타났다.

Keywords

References

  1. Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and regression trees, Wadsworth, Belmont, CA.
  2. Cappelli, C. and Iorio, D. (2010). Detecting contemporaneous mean co-breaking via ART and PCA. Quaderni di STATISTICA, 12, 169-184.
  3. Chang, Y. and Kim, H. (2011). Tree-structured nonlinear regression. The Korean Journal of Applied Statistics, 24, 759-768. https://doi.org/10.5351/KJAS.2011.24.5.759
  4. Charbonneau, K. B. (2014). Multiple fixed effects in binary response panel data models, The Bank of Canada Working paper, The Bank of Canada, Canada.
  5. Cleveland, W. S. (1979). Robust locally weighted regression and smoothing scatterplots. Journal of American Statistical Association, 74, 829-836. https://doi.org/10.1080/01621459.1979.10481038
  6. De'ath, G. (2002). Multivariate retression trees: A new technique for modeling species-environment relationships. Ecology, 83, 1105-1117.
  7. De'ath, G. (2013). mvpart: Multivariate partitioning, R package version 1.6-1. Available from http://CRAN.R-project.org/package=mvpart.
  8. Dzeroski, S. and Zenko, B. (2004). Is combining classifiers with stacking better than selecting the best one? Machine Learning, 54, 255-273. https://doi.org/10.1023/B:MACH.0000015881.36452.6e
  9. Jo, J. and Chang, U. J. (2013). A statistical analysis of the fat mass repeated measures data using mixed model. Journal of the Korean Data & Information Science Society, 24, 303-310. https://doi.org/10.7465/jkdi.2013.24.2.303
  10. Lee, S. K. (2005). On generalized multivariate decision tree by using GEE. Computational Statistics and Data Analysis, 49, 1105-1119. https://doi.org/10.1016/j.csda.2004.07.003
  11. Loh, W.-Y. (2002). Regression trees with unbiased variable selection and interaction detection. Statistics Sinica, 12, 361-386.
  12. Loh, W.-Y. and Zheng, W. (2013). Regression trees for longitudinal and multiresponse data. The Annals of Applied Statistics, 7, 495-522. https://doi.org/10.1214/12-AOAS596
  13. Meek, C., Chickering, D. M. and Heckerman, D. (2002). Autoregressive tree models for time-series analysis. Proceedings of the Second International SIAM Conference on Data Mining, 229-244.
  14. Rea, W. S., Relae M., Cappelli, C. and Brown J. A. (2010). Identification of changes in mean with regression trees: An application to market research. Econometric Reviews, 29, 754-777. https://doi.org/10.1080/07474938.2010.482001
  15. Segal, M. R. (1992). Tree structured methods for longitudinal data. Journal of American Statistical Association, 87, 407-418. https://doi.org/10.1080/01621459.1992.10475220
  16. Sela, R. J. and Simonoff, J. S. (2012). RE-EM trees: A data mining approach for longitudinal and clustered data. Machine Learning, 86, 169-207. https://doi.org/10.1007/s10994-011-5258-3
  17. Zhang, H. (1998). Classification trees for multiple binary responses. Journal of American Statistical Association, 93, 180-193. https://doi.org/10.1080/01621459.1998.10474100

Cited by

  1. An analysis of changes in the influence of GDP gap on inflation vol.26, pp.6, 2015, https://doi.org/10.7465/jkdi.2015.26.6.1377
  2. Study of child abuse families using logistic regression models vol.27, pp.5, 2016, https://doi.org/10.7465/jkdi.2016.27.5.1327
  3. How depression affects girls who experienced violence in home or at school: Using mixed model vol.27, pp.1, 2016, https://doi.org/10.7465/jkdi.2016.27.1.101
  4. A spatial panel regression model for household final consumption expenditure based on KTX effects vol.27, pp.5, 2016, https://doi.org/10.7465/jkdi.2016.27.5.1147