DOI QR코드

DOI QR Code

Decision Tree Classifier for Multiple Abstraction Levels of Data

다중 추상화 수준의 데이터를 위한 결정 트리 분류기

  • 정민아 (광주과학기술원 정보통신학과) ;
  • 이도헌 (한국과학기술원 바이오시스템학과)
  • Published : 2003.02.01

Abstract

Since the data is collected from disparate sources in many actual data mining environments, it is common to have data values in different abstraction levels. This paper shows that such multiple abstraction levels of data can cause undesirable effects in decision tree classification. After explaining that equalizing abstraction levels by force cannot provide satisfactory solutions of this problem, it presents a method to utilize the data as it is. The proposed method accommodates the generalization/specialization relationship between data values in both of the construction and the class assignment phase of decision tree classification. The experimental results show that the proposed method reduces classification error rates significantly when multiple abstraction levels of data are involved.

대규모 데이터 마이닝 환경에서는 이질적인 데이터베이스 혹은 파일 시스템으로부터 분석 대상 데이터를 수집하는 경우가 일반적이므로, 수집된 데이터가 서로 다른 추상화 수준(abstraction level)으로 표현되기 마련이다, 본 논문에서는 기존의 결정 트리(decision tree)를 서로 다른 추상화 수준으로 표현된 데이터에 적용할 때, 분류상 모순이 일어날 수 있음을 보이고, 그에 대한 해결방안을 제시한다. 제안하는 방법은 데이터 간에 존재하는 일반화/세분화 관련성을 결정 트리의 구축 단계는 물론, 클래스 할당 단계에도 반영하여 데이터간의 의미적 연관성을 효과적으로 활용할 수 있도록 한다. 아울러 실제 데이터에 기반을 둔 실험을 통해, 제안한 방법이 기존 방법보다 분류 오류율을 현저히 줄일 수 있음을 보인다.

Keywords

References

  1. J. Gehrke, R. Ramakrishinan and V. Ganti, 'RainForest A Framework for Fast Decision Tree Construction of Large Datasets,' Data Mining and Knowledge Discovery, Vol.4, pp.127-162, 2000 https://doi.org/10.1023/A:1009839829793
  2. J. Gehrke, V. Ganti, R. Ramakrishnan and W. Loh, 'BOAT Optimistic Decision Tree Construction,' In Proc. of ACM SIGMOD Conf., Philadelphia, Pennsylvania, pp.169-180, June, 1999 https://doi.org/10.1145/304182.304197
  3. M. Berry and G. Linoff, Data Mining Techniques For Marketing, Sales, and Customer Support, Wiley and Sons, 1997
  4. J. Quinlan, C4.5 : Programs for Machine Learning, Mo-rgan Kaufmann Pub., 1993
  5. M. Mehta, R. Agrawal and J. Rissanen, 'SLIQ : A Fast Scalable Classifier for Data Mining,' Proc. of the Fifth Int'l Conference on Extending Database Technulogy (EDBT), Avignon, France, March, 1996
  6. J. Shafer, R. Agrawal, M. Mehta, 'SPRINT : A Scalable Parallel Classifier for Data Mining,' Proc. of the 22th Int'l Conference on Very Large Databases, Mumbai (Bombay), India, September, 1996
  7. K. Hatonen, M. Klemettinen, H. Mannila, P. Ronkainen and H. Toivonen, 'Knowledge Discovery from Telecommu-nication Network Alarm Databases,' In Proc. of the 12th International Conference on Data Engineering, New Orleans, Louisiana, pp.115-122, February, 1996
  8. L. English, Improving Data Warehouse and Business Information Quality-Method for Reducing Costs and In-creasing Profits, Wiley & Sons, 1999
  9. R. Wang, V. Storey and C. Firth, 'A Framework for Analysis of Data Quality Research,' IEEE Transactions on Knowledge and Engineering, Vol.7, No.4, pp.623-640, August, 1995 https://doi.org/10.1109/69.404034
  10. Trillium Software System, 'A Practical Guide to Achiev-ing Enterprise Data Quality,' White Paper, Trillium Soft-ware, 1998
  11. J. Williams, Tools for Traveling Data, DBMS, Miller Freeman Inc., June, 1997
  12. Vality Technology Inc., 'The Five Legacy Data Contam-inants You Will Encounter in Your Warehouse Migra-tion,' White Paper, Vality Technology Inc., 1998
  13. G. Klir and T. Folger, Fuzzy Sets, Uncertainty, and In-formation, Prentice-Hall Int'l Inc., 1988
  14. C. Shannon, 'The Mathematical Theory of Communica-tion,' The Bell System Tech., 1948
  15. C. Batini, S. Ceri and Navathe, Conceptual Database De-sign, Benjamin Cummings, Inc., 1992
  16. X. Wang and H. Jiarong, 'On the handling of fuzziness for continuous valued attributes in decision tree generation,' Fuzzy Sets and Systems 99, pp.283-290, 1998 https://doi.org/10.1016/S0165-0114(97)00030-4
  17. C. Janikow, 'Fuzzy decision trees : issues and methods,' IEEE Transactions on, Systems, Man and Cybernetics, Part B, Vol.28, Issue.l, pp.1-14, February, 1998 https://doi.org/10.1109/3477.658573
  18. M Dong, R. Kothari, 'Look-ahead based fuzzy decision tree induction,' IEEE Transactions on Fuzzy Systems, Vol.9, Issue.3, pp.461-468, June, 2001 https://doi.org/10.1109/91.928742