Association rule ranking function using conditional probability increment ratio

조건부 확률증분비를 이용한 연관성 순위 결정 함수

  • Received : 2010.05.16
  • Accepted : 2010.07.05
  • Published : 2010.07.31

Abstract

The task of association rule mining is to find certain association relationships among a set of data items in a database. There are three primary measures for association rule, support and confidence and lift. In this paper we developed a association rule ranking function using conditional probability increment ratio. We compared our function with several association rule ranking functions by some numerical examples. As the result, we knew that our decision function was better than the existing functions. The reasons were that the proposed function of the reference value is not affected by a particular association threshold, and our function had a value between -1 and 1 regardless of the range for three association thresholds. And we knew that the ranking function using conditional probability increment ratio was very well reflected in the difference between association rule measures and the minimum association rule thresholds, respectively.

연관성 규칙 마이닝은 각 항목들 간의 관련성을 찾아내는 데 활용되며, 지지도, 신뢰도, 향상도 등의 연관성 측도를 기반으로 두 항목간의 관계를 수치화함으로써 의미 있는 규칙을 찾아낸다. 본 논문에서는 조건부 확률 증분비를 이용한 연관성 순위 결정 함수를 제안하고자 한다. 특히 항목 집합간의 고유한 연관성 정도를 제대로 반영하기 위해 조건부 확률 증분비를 이용하여 연관성 순위 결정 함수를 제안하여 3개의 연관기준값들 중 어느 하나라도 기준 이상이 되는 규칙의 순위를 매겨 필요한 연관성 규칙만을 생성할 수 있도록 한다. 모의실험을 해본 결과, 본 논문에서 제안한 함수는 기존의 함수와는 달리 특정 연관 기준값의 영향을 받지 않으며, 최저 연관성 기준값들의 범위와는 관계없이 항상 -1과 1 사이의 값을 가진다는 사실을 확인할 수 있었다. 또한 조건부 확률 증분비를 이용한 연관순위결정 함수는 대체적으로 연관성 측도들과 최저 연관기준값들간의 차이를 잘 반영하고 있는 것으로 나타났다.

Keywords

References

  1. Agrawal, R., Imielinski R. and Swami, A. (1993). Mining association rules between sets of items in large databases. Proceedings of the ACM SIGMOD Conference on Management of Data, 207-216.
  2. Agrawal, R. and Srikant, R. (1994). Fast algorithms for mining association rules. Proceedings of the 20th VLDB Conference, 487-499.
  3. Bayardo, R. J. (1998). Efficiently mining long patterns from databases. Proceedings of ACM SIGMOD Conference on Management of Data, 85-93.
  4. Cai, C. H., Fu, A. W. C., Cheng, C. H. and Kwong, W. W. (1998). Mining association rules with weighted items. Proceedings of International Database Engineering and Applications Symposium, 68-77.
  5. Cho, K. H. and Park, H. C. (2007). Association rule mining by environmental data fusion. Journal of the Korean Data & Information Science Society, 18, 279-287.
  6. Cho, K. H. and Park, H. C. (2008). A study of association rule application using self-organizing map for fused data. Journal of the Korean Data & Information Science Society, 19, 95-104.
  7. Choi, J. H. and Park, H. C. (2008). Comparative study of quantitative data binning methods in association rule. Journal of the Korean Data & Information Science Society, 19, 903-910.
  8. Han, J. and Fu, Y. (1999). Mining multiple-level association rules in large databases. IEEE Transactions on Knowledge and Data Engineering, 11, 68-77.
  9. Han, J., Pei, J. and Yin, Y. (2000). Mining frequent patterns without candidate generation. Proceedings of ACM SIGMOD Conference on Management of Data, 1-12.
  10. Liu, B., Hsu, W. and Ma, Y. (1999). Mining association rules with multiple minimum supports. Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining, 337-241.
  11. Park, H. C. (2008). The proposition of conditionally pure confidence in association rule mining. Journal of the Korean Data & Information Science Society, 19, 1141-1151.
  12. Park, H. C. (2010a). Development of associative rank decision function using basic association rule thresholds. Journal of the Korean Data Analysis Society, 12, to appear.
  13. Park, H. C. (2010b). Association rule ranking function by decreased lift influence. Journal of the Korean Data & Information Science Society, 21, unpublished.
  14. Park J. S., Chen M. S. and Philip S. Y. (1995). An effective hash-based algorithms for mining association rules. Proceedings of ACM SIGMOD Conference on Management of Data, 175-186.
  15. Pasquier, N., Bastide, Y., Taouil, R. and Lakhal, L. (1999). Discovering frequent closed itemsets for association rules. Proceedings of the 7th International Conference on Database Theory, 398-416.
  16. Pei, J., Han, J. and Mao, R. (2000). CLOSET: An efficient algorithm for mining frequent closed itemsets. Proceedings of ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 21-30.
  17. Srikant, R. and Agrawal, R. (1995). Mining generalized association rules. Proceedings of the 21st VLDB Conference, 407-419.
  18. Toivonen H. (1996). Sampling large database for association rules. Proceedings of the 22nd VLDB Conference, 134-145.
  19. Wu, X., Zhang, C. and Zhang, S. (2004). Efficient mining of both positive and negative association rules. ACM Transactions on Information Systems, 22, 381-405. https://doi.org/10.1145/1010614.1010616
  20. Zhou, L. and Yau, S. (2007). Efficient association rule mining among both frequent and infrequent items. Computers and Mathematics with Applications, 54, 737-749. https://doi.org/10.1016/j.camwa.2007.02.010