A study on removal of unnecessary input variables using multiple external association rule

다중외적연관성규칙을 이용한 불필요한 입력변수 제거에 관한 연구

  • Cho, Kwang-Hyun (Department of Early Childhood Education, Changwon National University) ;
  • Park, Hee-Chang (Department of Statistics, Changwon National University)
  • Received : 2011.07.18
  • Accepted : 2011.08.21
  • Published : 2011.10.01

Abstract

The decision tree is a representative algorithm of data mining and used in many domains such as retail target marketing, fraud detection, data reduction, variable screening, category merging, etc. This method is most useful in classification problems, and to make predictions for a target group after dividing it into several small groups. When we create a model of decision tree with a large number of input variables, we suffer difficulties in exploration and analysis of the model because of complex trees. And we can often find some association exist between input variables by external variables despite of no intrinsic association. In this paper, we study on the removal method of unnecessary input variables using multiple external association rules. And then we apply the removal method to actual data for its efficiencies.

의사결정나무는 데이터마이닝의 대표적인 알고리즘으로서, 의사결정 규칙을 도표화하여 관심대상이 되는 집단을 몇 개의 소집단으로 분류하거나 예측을 수행하는 방법이다. 일반적으로 의사결정나무의 모형 생성 시, 입력 변수의 수가 많을 경우 생성된 의사결정모형은 복잡한 형태가 될 수 있고, 모형 탐색 및 분석에 있어 어려움을 겪기도 한다. 이때 입력변수들 간의 내재적인 관련성은 없으나, 외적 변수에 의하여 각 변수가 우연히 어떤 다른 변수와 연결됨으로써 관련성이 있는 것으로 나타나는 것을 종종 볼 수 있다. 이에 본 논문에서는 의사결정나무 생성 시, 입력 변수에 대한 외적 관계를 파악할 수 있는 다중외적연관성규칙을 이용하여 의사결정나무 생성에 불필요한 입력변수를 제거하는 방법을 제시하고 그 효율성을 파악하기 위하여 실제 자료에 적용하고자 한다.

Keywords

References

  1. Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and regression trees, Wadsworth and Brooks, California.
  2. Cho, K. H. and Park, H. C. (2006). A study for intervening effect verification using association rules. Journal of the Korean Data Analysis Society, 8, 1905-1914.
  3. Hartigan, J. A. (1975). Clustering algorithms, John Wiley & Sons, New York.
  4. Kim, M. H. and Park, H. C. (2007). A study for discovery of distorter variable using association rules. Journal of the Korean Data Analysis Society, 9, 711-719.
  5. Kim, M. H. and Park, H. C. (2008a). Development of component a ssociation rule s and macro algorithm. Journal of the Korean Data & Information Science Society, 19, 197-207.
  6. Kim, M. H. and Park, H. C. (2008b). Development of extraneous association rules and SAS macro algorithm. Journal of the Korean Data Analysis Society, 10, 1141-1152.
  7. Lee, K. W. and Park, H. C. (2007). A study of k-means clustering for association rule. Journal of the Korean Data Analysis Society, 9, 2919-2930.
  8. Lee, K. W. and Park, H. C. (2008). A study for statistical criterion in negative association rules using boolean analyzer. Journal of the Korean Data & Information Science Society, 19, 569-576.
  9. Park, H. C. and Cho, K. H. (2006a). Discovery of association rules using latent variables. Journal of the Korean Data & Information Science Society, 17, 149-160.
  10. Park, H. C. and Cho, K. H. (2006b). A study for antecedent association rules. Journal of the Korean Data & Information Science Society, 17, 1077-1083.
  11. Quinlan, J. R. (1993). C4.5 programs for machine learning, Morgan Kaufmann Publishers, San Francisco.