DOI QR코드

DOI QR Code

A study on decision tree creation using marginally conditional variables

주변조건부 변수를 이용한 의사결정나무모형 생성에 관한 연구

  • Cho, Kwang-Hyun (Department of Early Childhood Education, Changwon National University) ;
  • Park, Hee-Chang (Department of Statistics, Changwon National University)
  • Received : 2012.02.08
  • Accepted : 2012.03.16
  • Published : 2012.03.31

Abstract

Data mining is a method of searching for an interesting relationship among items in a given database. The decision tree is a typical algorithm of data mining. The decision tree is the method that classifies or predicts a group as some subgroups. In general, when researchers create a decision tree model, the generated model can be complicated by the standard of model creation and the number of input variables. In particular, if the decision trees have a large number of input variables in a model, the generated models can be complex and difficult to analyze model. When creating the decision tree model, if there are marginally conditional variables (intervening variables, external variables) in the input variables, it is not directly relevant. In this study, we suggest the method of creating a decision tree using marginally conditional variables and apply to actual data to search for efficiency.

데이터마이닝은 주어진 데이터베이스에서 항목간의 흥미로운 관계를 찾아내는 기법으로서 의사결정나무는 데이터마이닝의 대표적인 알고리즘이라고 할 수 있다. 의사결정나무는 관심대상이 되는 집단을 몇 개의 소집단으로 분류하거나 예측을 수행하는 방법이다. 일반적으로 연구자가 의사결정나무 모형을 생성 할 때 모형 생성의 기준 및 입력 변수의 수에 따라 복잡한 모형이 생성되기도 한다. 특히 의사결정나무 모형에서 입력 변수의 수가 많을 경우 생성된 모형은 복잡한 형태가 될 수 있고, 모형 분석이 어려울 수도 있다. 만일 입력변수에서 주변조건부 변수 (매개변수, 외적변수)가 존재한다면 이 입력변수는 직접적인 관련성이 없는 것으로 판단한다. 이에 본 논문에서는 주변조건부 변수를 고려하여 의사결정나무모형을 생성하는 방법을 제시하고 그 효율성을 파악하기 위하여 실제 자료에 적용하고자 한다.

Keywords

References

  1. Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and regression trees, Wadsworth and books, California.
  2. Cho, K. H. and Park, H. C. (2011a). A study on insignificant rules discovery in association rule mining. Journal of the Korean Data & Information Science Society, 22, 81-88.
  3. Cho, K. H. and Park, H. C. (2011b). A study on decision tree creation using intervening variable. Journal of the Korean Data & Information Science Society, 22, 671-678.
  4. Cho, K. H. and Park, H. C. (2011c). A study on removal of unnecessary input variables using multiple external association rule. Journal of the Korean Data & Information Science Society, 22, 877-884.
  5. Cho, K. H. and Park, H. C. (2011d). Discovery of insignificant association rules using external variable. Journal of the Korean Data Analysis Society, 13, 1343-1352.
  6. Hartigan, J. A. (1975). Clustering algorithms, John Wiley & Sons, New York.
  7. Park, H. C. (2010). Association rule ranking function by decreased lift influence. Journal of the Korean Data & Information Science Society, 21, 397-405.
  8. Quinlan, J. R. (1993). C4.5 programs for machine learning, Morgan Kaufmann Publishers, San Francisco.

Cited by

  1. Determinants of student course evaluation using hierarchical linear model vol.24, pp.6, 2013, https://doi.org/10.7465/jkdi.2013.24.6.1285
  2. Usage of auxiliary variable and neural network in doubly robust estimation vol.24, pp.3, 2013, https://doi.org/10.7465/jkdi.2013.24.3.659
  3. Analysis of employee's characteristic using data visualization vol.25, pp.4, 2014, https://doi.org/10.7465/jkdi.2014.25.4.727
  4. A study on 3-step complex data mining in society indicator survey vol.23, pp.5, 2012, https://doi.org/10.7465/jkdi.2012.23.5.983
  5. The study on the determinants of the number of job changes vol.26, pp.2, 2015, https://doi.org/10.7465/jkdi.2015.26.2.387
  6. Major gene interactions effect identification on the quality of Hanwoo by radial graph vol.24, pp.1, 2013, https://doi.org/10.7465/jkdi.2013.24.1.151