DOI QR코드

DOI QR Code

Proposition of causally confirmed measures in association rule mining

인과적 확인 측도에 의한 연관성 규칙 탐색

  • Received : 2014.06.16
  • Accepted : 2014.07.15
  • Published : 2014.07.31

Abstract

Data mining is the representative analysis methodology in the era of big data, and is the process to analyze a massive volume database and summarize it into meaningful information. Association rule technique finds the relationship among several items in huge database using the interestingness measures such as support, confidence, lift, etc. But these interestingness measures cannot be used to establish a causality relationship between antecedent and consequent item sets. Moreover, we can not know association direction by them. This paper propose causally confirmed association thresholds to compensate for these problems, and then check the three conditions of interestingness measures. The comparative studies with basic association thresholds, causal association thresholds, and causally confirmed association thresholds are shown by simulation studies. The results show that causally confirmed association thresholds are better than basic and causal association thresholds.

대량의 데이터로부터 과거에 알려지지 않았던 유용한 정보를 발견하는 기술인 데이터 마이닝 기법은 오늘날 빅 데이터 시대에 가장 대표적인 분석 기법이라고 할 수 있다. 이들 중에서도 연관성 규칙은 지지도, 신뢰도, 향상도 등의 여러 가지 흥미도 측도를 기반으로 하여 항목들 간의 관련성을 찾아내는 것이다. 그러나 기본적인 연관성 평가 기준만으로는 두 항목 간의 인과관계를 설명할 수 없을 뿐만 아니라 연관성의 방향도 파악할 수 없다. 본 논문에서는 이러한 문제를 해결하기 위해 인과적 확인 연관성 평가 기준을 제안하는 동시에, 제안한 평가 기준들이 흥미도 측도의 조건을 충족하는지의 여부를 점검하였다. 본 논문에서 제안한 인과적 확인 향상도는 세 가지 조건 모두를 만족하는 것으로 입증되었다. 인과적 확인 지지도와 인과적 확인 신뢰도는 동시 발생 확률의 값에 따라 단조 증가하는 조건과 각 항목의 주변 확률의 값에 따라 단조 감소하는 조건은 만족하였다. 또한 예제를 통해 기본적인 연관성 평가 기준과 인과적 연관성 평가 기준, 그리고 인과적 확인 연관성 평가 기준을 비교해 본 결과, 본 논문에서 제안하는 인과적 확인 측도들이 다른 평가 기준에 비해 가장 바람직한 측도라는 사실을 파악하였다.

Keywords

References

  1. Agrawal, R., Imielinski, R. and Swami, A. (1993). Mining association rules between sets of items in large databases. Proceedings of the ACM SIGMOD Conference on Management of Data, 207-216.
  2. Agrawal, R. and Srikant, R. (1994). Fast algorithms for mining association rules. Proceedings of the 20th VLDB Conference, 487-499.
  3. Berzal, F., Cubero, J., Marin, N., Sanchez, D., Serrano, J. and Vila, A. (2005). Association rule evaluation for classification purposes. Actas del III Taller Nacional de Mineria de Datosy Aprendizaje, TAMIDA2005, 135-144.
  4. Cai, C. H., Fu, A. W. C., Cheng, C. H. and Kwong, W. W. (1998). Mining association rules with weighted items. Proceedings of International Database Engineering and Applications Symposium, 68-77.
  5. Cho, K. H. and Park, H. C. (2011a). Study on the multi intervening relation in association rules. Journal of the Korean Data Analysis Society, 13, 297-306.
  6. Cho, K. H. and Park, H. C. (2011b). A study on insignificant rules discovery in association rule mining. Journal of the Korean Data & Information Science Society, 22, 81-88.
  7. Han, J. and Fu, Y. (1999). Mining multiple-level association rules in large databases. IEEE Transactions on Knowledge and Data Engineering, 11, 68-77.
  8. Han, J., Pei, J. and Yin, Y. (2000). Mining frequent patterns without candidate generation. Proceedings of ACM SIGMOD Conference on Management of Data, 1-12.
  9. Jin, D. S., Kang, C., Kim, K. K. and Choi, S. B. (2011). CRM on travel agency using association rules. Journal of the Korean Data Analysis Society, 13, 2945-2952.
  10. Kim, B., Park, Y. and Jang, N. (2013). Study for independence of hits in professional baseball games. Journal of the Korean Data & Information Science Society, 24, 1421-1428. https://doi.org/10.7465/jkdi.2013.24.6.1421
  11. Kim, N. (2008). Effect of market basket size on the accuracy of association rule measures. Asia Pacific Journal of Information Systems, 18, 95-114.
  12. Kodratoff, Y. (2000). Comparing machine learning and knowledge discovery in databases: An application to knowledge discovery in texts. Proceeding of Machine Learning and its Applications: Advanced Lectures, 1-21.
  13. Lim, J., Lee, K. and Cho, Y. (2010). A study of association rule by considering the frequency. Journal of the Korean Data & Information Science Society, 21, 1061-1069.
  14. Liu, B., Hsu, W. and Ma, Y. (1999). Mining association rules with multiple minimum supports. Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining, 337-241.
  15. Park, H. C. (2012a). Negatively attributable and pure confidence for generation of negative association rules. Journal of the Korean Data & Information Science Society, 23, 707-716.
  16. Park, H. C. (2012b). Exploration of PIM based similarity measures as association rule thresholds. Journal of the Korean Data & Information Science Society, 23, 1127-1135. https://doi.org/10.7465/jkdi.2012.23.6.1127
  17. Park, H. C. (2013). Proposition of causal association rule thresholds. Journal of the Korean Data & Information Science Society, 24, 1189-1197. https://doi.org/10.7465/jkdi.2013.24.6.1189
  18. Pasquier, N., Bastide, Y., Taouil, R. and Lakhal, L. (1999). Discovering frequent closed itemsets for association rules. Proceedings of the 7th International Conference on Database Theory, 398-416.
  19. Pei, J., Han, J. and Mao, R. (2000). CLOSET: An efficient algorithm for mining frequent closed itemsets. Proceedings of ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 21-30.
  20. Piatetsky-Shapiro, G. (1991). Discovery, analysis and presentation of strong rules. Knowledge Discovery in Databases, AAAI/MIT Press, 229-248.
  21. Saygin, Y., Vassilios, S. V. and Clifton, C. (2002). Using unknowns to prevent discovery of association rules. Proceedings of 2002 Conference on Research Issues in Data Engineering, 45-54.
  22. Srinkant, R., Vu, Q. and Agrawal, R. (1997). Mining association rules with item constraints. Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, 67-73.

Cited by

  1. Proposition of balanced comparative confidence considering all available diagnostic tools vol.26, pp.3, 2015, https://doi.org/10.7465/jkdi.2015.26.3.611