DOI QR코드

DOI QR Code

로지스틱회귀모형의 변수선택에서 로그-오즈 그래프를 통한 로그-밀도비 연구

A study on log-density with log-odds graph for variable selection in logistic regression

  • 강명욱 (숙명여자대학교 통계학과) ;
  • 신은영 (숙명여자대학교 통계학과)
  • 투고 : 2011.12.06
  • 심사 : 2012.01.03
  • 발행 : 2012.01.31

초록

반응변수가 주어졌을 때 설명변수의 조건부 확률분포의 로그-밀도비는 로지스틱회귀모형에서 어떤 설명변수가 어떻게 모형에 포함되는지에 대한 변수선택문제에서 유용한 정보를 제공한다. 설명변수의 조건부 확률분포가 좌우대칭이 아닌 경우 감마분포로 가정하는 것이 적절하고 이 경우 x항과 log(x)항이 모형에 포함되어야 한다. 로그-오즈 그래프는 변수선택문제를 연구하는데 매우 중요한 도구가 된다. 이러한 그래픽적 연구에 의하면, x|y = 0과 x|y = 1의 두 분포가 겹치는 경우에서는 x항과 log(x)항 모두 필요하다. 그리고 두 분포가 분리된 경우에는 x항 또는 log(x)항 중 하나만 필요하다.

The log-density ratio of the conditional densities of the predictors given the response variable provides useful information for variable selection in the logistic regression model. In this paper, we consider the predictors that are needed and how they should be included in the model. If the conditional distributions are skewed, the distributions can be considered as gamma distributions. Under this assumption, linear and log terms are generally included in the model. The log-odds graph is a very useful graphical tool in this study. A graphical study is presented which shows that if the conditional distributions of x|y for the two groups overlap significantly, we need both the linear and quadratic terms. On the contrary, if they are well separated, only the linear or log term is needed in the model.

키워드

참고문헌

  1. Cook, R. D. and Weisberg, S. (1999). Applied regression including computing and graphics, Wiley, New York.
  2. Hwang, H. (2010). Variable selection for multiclassi cation by LS-SVM. Journal of the Korean Data & Information Science Society, 21, 959-965.
  3. Kay, R. and Little, S. (1987). Transformations of the explanatory variables in the logistic regression model for binary data. Biometrika, 74, 495-501. https://doi.org/10.1093/biomet/74.3.495
  4. Kullback, S. (1959). Information theory and statistics, Wiley, New York.
  5. Kahng, M. (2011). A study on log-density ratio in logistic regression model for binary data. Journal of the Korean Data & Information Science Society, 22, 107-113.
  6. Kahng, M., Kim, B. and Hong, J. (2010). Graphical regression and model assessment in logistic model. Journal of the Korean Data & Information Science Society, 21, 21-32.
  7. Nelder, J. A. and Wedderburn, R. W. M. (1972). Generalized linear models. Journal of Royal Statistical Society A, 135, 370-384. https://doi.org/10.2307/2344614
  8. Scrucca, L. (2003). Graphics for studying logistics regression models. Statistical Methods and Applications, 11, 371-394
  9. Scrucca, L. and Weisberg, S. (2004). A simulation study to investigate the behavior of the log-density ratio under normality. Communication in Statistics - Simulation and Computation, 33, 159-178. https://doi.org/10.1081/SAC-120028439
  10. Shim, J. (2011). Variable selection in the kernel Cox regression. Journal of the Korean Data & Information Science Society, 22, 795-801.

피인용 문헌

  1. Analysis of factor of life planners' satisfaction after turnover using the cumulative logit model vol.24, pp.6, 2013, https://doi.org/10.7465/jkdi.2013.24.6.1369
  2. Model assessment with residual plot in logistic regression vol.26, pp.1, 2015, https://doi.org/10.7465/jkdi.2015.26.1.141
  3. A polychotomous regression model with tensor product splines and direct sums vol.25, pp.1, 2014, https://doi.org/10.7465/jkdi.2014.25.1.19
  4. Exploring interaction using 3-D residual plots in logistic regression model vol.25, pp.1, 2014, https://doi.org/10.7465/jkdi.2014.25.1.177
  5. 사교육비 결정요인 분석: 전업주부를 중심으로 vol.23, pp.3, 2012, https://doi.org/10.7465/jkdi.2012.23.3.543