DOI QR코드

DOI QR Code

Exploration of relationship between confirmation measures and association thresholds

기준 확인 측도와 연관성 평가기준과의 관계 탐색

  • Received : 2013.06.18
  • Accepted : 2013.07.13
  • Published : 2013.07.31

Abstract

Association rule of data mining techniques is the method to quantify the relevance between a set of items in a big database, andhas been applied in various fields like manufacturing industry, shopping mall, healthcare, insurance, and education. Philosophers of science have proposed interestingness measures for various kinds of patterns, analyzed their theoretical properties, evaluated them empirically, and suggested strategies to select appropriate measures for particular domains and requirements. Such interestingness measures are divided into objective, subjective, and semantic measures. Objective measures are based on data used in the discovery process and are typically motivated by statistical considerations. Subjective measures take into account not only the data but also the knowledge and interests of users who examine the pattern, while semantic measures additionally take into account utility and actionability. In a very different context, researchers have devoted a lot of attention to measures of confirmation or evidential support. The focus in this paper was on asymmetric confirmation measures, and we compared confirmation measures with basic association thresholds using some simulation data. As the result, we could distinguish the direction of association rule by confirmation measures, and interpret degree of association operationally by them. Futhermore, the result showed that the measure by Rips and that by Kemeny and Oppenheim were better than other confirmation measures.

데이터 마이닝닝 기법들 중에서 연관성 규칙 마이닝 (association rule mining)은 대용량의 사건 발생 기록 데이터로부터 항목 간의 연관성을 측정하는 기법이다. 이 기법은 매우 방대한 양의 상품 또는 서비스 거래 기록 데이터로부터 항목들 간의 연관성을 측정하는 기법으로 제조업, 유통업, 보험업, 의료 및 교육 분야 등 많은 분야에 적용되고 있다. 의미 있는 연관성 규칙을 탐색하기 위한 흥미도 측도는 크게 객관적 흥미도 측도와 주관적 흥미도 측도, 그리고 의미론적 흥미도 측도로 분류할 수 있다. 이와는 별개로 기준 확인 또는 증거 지원과 관련된 측도들을 개발하기 위해 많은 시도가 있었으나 기준 확인 측도에 대한 연관성 평가 기준 조건 충족 여부나 기본적인 연관성 평가 측도인 지지도, 신뢰도, 그리고 향상도 등과의 관계는 아직 규명되지 않았다. 이에 본 논문에서는 가장 많이 활용되고 있는 비대칭적 기준 확인 측도에 대해 흥미도 측도의 기준에 대한 조건 충족 여부를 검토하는 동시에 기본적인 연관성 평가 측도들과의 관계를 수식을 통해 유도한 후, 예제를 통해 연관성 규칙의 관점에서 기준 확인 측도의 유용성을 살펴보았다. 그 결과, 본 논문에서 고려한 모든 기준 확인 측도들이 흥미도 측도의 기준에 대한 조건들을 모두 만족하였다. 또한 이들을 기본적인 연관성 평가 기준인 지지도, 신뢰도, 그리고 향상도와의 관계를 식을 통해 규명한 동시에 방향성과 행태적 해석 가능성을 예제를 통해 확인할 수 있었다. 특히 이들 측도 중에서 Kemeny와 Oppenheim이 제안한 측도와 Rips가 제안한 측도가 가장 바람직한 연관성 평가 기준으로 활용할 수 있다는 사실을 확인할 수 있었다.

Keywords

References

  1. Agrawal, R., Imielinski, R. and Swami, A. (1993). Mining association rules between sets of items in large databases. Proceedings of the ACM SIGMOD Conference on Management of Data, 207-216.
  2. Cho, K. H. and Park, H. C. (2011). Discovery of insignificant association rules using external variable. Journal of the Korean Data Analysis Society, 13, 1343-1352.
  3. Crupi, V., Tentori, K. and Gonzalez, M. (2007). On Bayesian measures of evidential support: Theoretical and empirical issues. Philosophy of Science, 74, 229-252. https://doi.org/10.1086/520779
  4. Freitas, A. (1999). On rule interestingness measures. Knowledge-based System, 12, 309-315. https://doi.org/10.1016/S0950-7051(99)00019-2
  5. Geng, L. and Hamilton, H. J. (2006). Interestingness measures for data mining: A survey. ACM Computing Surveys, 38, 1-32. https://doi.org/10.1145/1132952.1132953
  6. Glass, D. H. (2013). Confirmation measures of association rule interestingness. Knowledge-Based Systems, 44, 65-77. https://doi.org/10.1016/j.knosys.2013.01.021
  7. Han, J. and Fu, Y. (1999). Mining multiple-level association rules in large databases. IEEE Transactions on Knowledge and Data Engineering, 11, 68-77.
  8. Han, J., Pei, J. and Yin, Y. (2000). Mining frequent patterns without candidate generation. Proceedings of ACM SIGMOD Conference on Management of Data, 1-12.
  9. Hilderman, R. J. and Hamilton, H. J. (2000). Applying objective interestingness measures in data mining systems. Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, 432-439.
  10. Kemeny, J. G. and Oppenheim, P. (1952). Degree of factual support. Philosophy of Science, 19, 307-324. https://doi.org/10.1086/287214
  11. Lim, J., Lee, K. and Cho, Y. (2010). A study of association rule by considering the frequency. Journal of the Korean Data & Information Science Society, 21, 1061-1069.
  12. Liu, B., Hsu, W. and Ma, Y. (1999). Mining association rules with multiple minimum supports. Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining, 337-241.
  13. Liu, B., Hsu, W., Chen, S. and Ma, Y. (2000). Analyzing the subjective interestingness of association rules. IEEE Intelligent Systems, 15, 47-55. https://doi.org/10.1109/5254.889106
  14. Mortimer, H. (1988), The logic of induction, Prentice Hall, Paramus.
  15. Nozick, R. (1981), Philosophical explanations, Clarendon Press, Oxford.
  16. Park, H. C. (2011a). Proposition of negatively pure association rule threshold. Journal of the Korean Data & Information Science Society, 22, 179-188.
  17. Park, H. C. (2011b). The proposition of attributably pure confidence in association rule mining. Journal of the Korean Data & Information Science Society, 22, 235-243.
  18. Park, H. C. (2011c). The application of some similarity measures to association rule thresholds. Journal of the Korean Data Analysis Society, 13, 1331-1342.
  19. Park, H. C. (2012a). Negatively attributable and pure confidence for generation of negative association rules. Journal of the Korean Data & Information Science Society, 14, 707-716.
  20. Park, H. C. (2012b). Exploration of PIM based similarity measures as association rule thresholds. Journal of the Korean Data & Information Science Society, 23, 1127-1135. https://doi.org/10.7465/jkdi.2012.23.6.1127
  21. Pei, J., Han, J. and Mao, R. (2000). CLOSET: An efficient algorithm for mining frequent closed itemsets. Proceedings of ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 21-30.
  22. Piatetsky-Shapiro, G. (1991). Discovery, analysis and presentation of strong rules. Proceedings of the 9th National Conference on Artificial Intelligence: Knowledge Discovery in Databases, 229-248.
  23. Rips, L. J. (2001). Two kinds of reasoning. Psychological Science, 12, 129-134. https://doi.org/10.1111/1467-9280.00322
  24. Saygin Y., Vassilios S. V. and Clifton C.(2002). Using unknowns to prevent discovery of association rules. Proceedings of 2002 Conference on Research Issues in Data Engineering, 45-54.
  25. Silberschatz, A. and Tuzhilin, A. (1996). What makes patterns interesting in knowledge discovery systems. IEEE Transactions on Knowledge Data Engineering, 8, 970-974. https://doi.org/10.1109/69.553165
  26. Srikant, R. and Agrawal, R. (1995). Mining generalized association rules. Proceedings of the 21st VLDB Conference, 407-419.
  27. Tan, P. N., Kumar, V. and Srivastava, J. (2002). Selecting the right interestingness measure for association patterns. Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 32-41.