올바른 연관성 규칙 생성을 위한 의사결정과정의 제안

Decision process for right association rule generation

  • 투고 : 2010.02.02
  • 심사 : 2010.03.09
  • 발행 : 2010.03.31

초록

데이터마이닝은 방대한 양의 데이터 속에서 쉽게 드러나지 않는 유용한 정보를 체계적이고도 자동적으로 찾아내는 기법이다. 데이터마이닝의 중요한 목표 중의 하나는 여러 변수들 간의 관계를 발견하고 결정하는 것이다. 연관성 규칙은 항목 집합으로 표현된 트랜잭션에서 각 항목간의 연관성을 반영하는 규칙으로서, 항목 집합간의 관계를 지지도, 신뢰도, 순수 신뢰도 등과 같은 흥미도 측도에 의해 명확히 수치화함으로써 두 개 이상의 항목집합간의 관련성을 표시해주기 때문에 현업에서 많이 활용되고 있다. 본 논문에서는 기존에 많이 활용되고 있는 흥미도 측도인 신뢰도와 순수 신뢰도의 문제점을 보완하여 연관성 규칙을 올바르게 생성하기 위한 새로운 의사결정과정을 제안하고자 한다. 본 논문에서 제안하는 의사결정과정은 특히 스트리밍 데이터베이스에서의 연관성 규칙을 탐색하는 데 효율적이다.

Data mining is the process of sorting through large amounts of data and picking out useful information. An important goal of data mining is to discover, define and determine the relationship between several variables. Association rule mining is an important research topic in data mining. An association rule technique finds the relation among each items in massive volume database. Association rule technique consists of two steps: finding frequent itemsets and then extracting interesting rules from the frequent itemsets. Some interestingness measures have been developed in association rule mining. Interestingness measures are useful in that it shows the causes for pruning uninteresting rules statistically or logically. This paper explores some problems for two interestingness measures, confidence and net confidence, and then propose a decision process for right association rule generation using these interestingness measures.

키워드

참고문헌

  1. 안광일, 김성집 (2003). 연관규칙 탐색에서의 새로운 흥미도 척도의 제안. <대한산업공학회지>, 29, 41-48.
  2. Agrawal, R., Imielinski R. and Swami, A. (1993). Mining association rules between sets of items in large databases. Proceedings of the ACM SIGMOD Conference on Management of Data, 207-216.
  3. Agrawal, R. and Srikant, R. (1994). Fast algorithms for mining association rules. Proceedings of the 20th VLDB Conference, 487-499.
  4. Bayardo, R. J. (1998). Efficiently mining long patterns from databases. Proceedings of ACM SIGMOD Conference on Management of Data, 85-93.
  5. Bing, Liu, B., Hsu, W., Chen, S. and Ma, Y. (2000). Analyzing the subjective interestingness of association rules. IEEE Intelligent Systems, 15, 47-55. https://doi.org/10.1109/5254.889106
  6. Cai, C. H., Fu, A. W. C., Cheng, C. H. and Kwong, W. W. (1998). Mining association rules with weighted items. Proceedings of International Database Engineering and Applications Symposium, 68-77.
  7. Cho, K. H. and Park, H. C. (2007). Association rule mining by environmental data fusion. Journal of the Korean Data & Information Science Society, 18, 279-287.
  8. Cho, K. H. and Park, H. C. (2008). A study of association rule application using self-organizing map for fused data. Journal of the Korean Data & Information Science Society, 19, 95-104.
  9. Choi, J. H. and Park, H. C. (2008). Comparative study of quantitative data binning methods in association rule. Journal of the Korean Data & Information Science Society, 19, 903-910.
  10. Freitas, A. (1999). On rule interestingness measures. Knowledge-based System, 12, 309-315. https://doi.org/10.1016/S0950-7051(99)00019-2
  11. Han, J. and Fu, Y. (1999). Mining multiple-level association rules in large da tabases. IEEE Transactionson Knowledge and Data Engineering, 11, 68-77.
  12. Han, J., Pei, J. and Yin, Y. (2000). Mining frequent patterns without candidate generation. Proceedings of ACM SIGMOD Conference on Management of Data, 1-12.
  13. Hilderman, R. J. and Hamilton H. J. (2000). Applying objective interestingness measures in data mining systems. Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, 432-439.
  14. Lee, K. W. and Park, H. C. (2008). Application of k-means clustering for association rule using measure of association. Journal of the Korean Data & Information Science Society, 19, 925-935.
  15. Liu, B., Hsu, W. and Ma, Y. (1999). Mining association rules with multiple minimum supports. Proceedings of the 5th Int. Conference on Knowledge Discovery and Data Mining, 337-241.
  16. Park, H. C. (2008). The proposition of conditionally pure confidence in association rule mining. Journal of the Korean Data & Information Science Society, 19, 1141-1151.
  17. Park, H. C. and Song, K. M. (2002). Statistical decision making of association threshold in association rule data mining. Journal of the Korean Data & Information Science Society, 13, 115-128.
  18. Park, J. S., Chen M. S. and Philip S. Y. (1995). An effective hash-based algorithms for mining association rules. Proceedings of ACM SIGMOD Conference on Management of Data, 175-186.
  19. Pasquier, N., Bastide, Y., Taouil, R. and Lakhal, L. (1999). Discovering frequent closed itemsets for association rules. Proceedings of the 7th International Conference on Database Theory, 398-416.
  20. Pei, J., Han, J. and Mao, R. (2000). CLOSET: An efficient algorithm for mining frequent closed itemsets. Proceedings of ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 21-30.
  21. Silberschatz, A. and Tuzhilin, A. (1996). What makes patterns interesting in knowledge discovery systems. IEEE transactions on Knowledge Data Engineering, 8, 970-974. https://doi.org/10.1109/69.553165
  22. Srikant, R. and Agrawal, R. (1995). Mining generalized association rules. Proceedings of the 21st VLDB Conference, 407-419.
  23. Tan, P. N., Kumar, V. and Srivastava, J. (2002). Selecting the right interestingness measure for association patterns. Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, 32-41.
  24. Toivonen, H. (1996). Sampling large database for association rules. Proceedings of the 22nd VLDB Conference, 134-145.