DOI QR코드

DOI QR Code

The development of symmetrically and attributably pure confidence in association rule mining

연관성 규칙에서 활용 가능한 대칭적 기여 순수 신뢰도의 개발

  • Received : 2014.04.02
  • Accepted : 2014.05.19
  • Published : 2014.05.31

Abstract

The most widely used data mining technique for big data analysis is to generate meaningful association rules. This method has been used to find the relationship between set of items based on the association criteria such as support, confidence, lift, etc. Among them, confidence is the most frequently used, but it has the drawback that we can not know the direction of association by it. The attributably pure confidence was developed to compensate for this drawback, but the value was changed by the position of two item sets. In this paper, we propose four symmetrically and attributably pure confidence measures to compensate the shortcomings of confidence and the attributably pure confidence. And then we prove three conditions of interestingness measure by Piatetsky-Shapiro, and comparative studies with confidence, attributably pure confidence, and four symmetrically and attributably pure confidence measures are shown by numerical examples. The results show that the symmetrically and attributably pure confidence measures are better than confidence and the attributably pure confidence. Also the measure NSAPis found to be the best among these four symmetrically and attributably pure confidence measures.

빅 데이터 분석을 위한 데이터마이닝 기법 중의 하나인 연관성 규칙은 지지도, 신뢰도, 향상도 등의 여러 가지 연관성 평가기준을 기반으로 하여 항목집합들 간의 관련성을 찾아내는 데 활용되고 있다. 기본적인 연관성 평가기준들 중에서 가장 많이 활용되고 있는 신뢰도는 연관성의 방향 (음 또는 양)을 알 수가 없다는 단점을 가지고 있다. 이를 보완하기 위한 측도로 순수 신뢰도 기여 순수 신뢰도가 제안되었으나, 이는 전항과 후항이 바뀌면 그 값이 달라지는 문제점이 있다. 본 논문에서는 기존의 신뢰도와 순수 신뢰도, 그리고 기여 순수 신뢰도의 단점을 보완한 연관성 평가 기준으로 네 가지의 대칭적 기여 순수 신뢰도를 제안하였다. 또한 신뢰도와 기여 순수 신뢰도, 그리고 네 가지의 대칭적 기여 순수 신뢰도를 예제를 통하여 비교 분석하였다. 그 결과, 대칭적 기여 순수 신뢰도는 그 부호에 의해 연관성 규칙의 방향을 파악할 수 있는 동시에 전항과 후항이 바뀌어도 그 값이 변하지 않으므로 연관성 규칙을 생성하는 데 매우 유익한 평가 기준이라는 사실을 확인할 수 있었다. 이들 네 가지 대칭적 기여 순수 신뢰도 중에서는 두 종류의 기여 순수 신뢰도의 분자의 합과 분모의 합의 비로 나타나는 측도가 가장 바람직한 것으로 예제를 통하여 확인하였다.

Keywords

References

  1. Agrawal, R., Imielinski, R. and Swami, A. (1993). Mining association rules between sets of items in large databases. Proceedings of the ACM SIGMOD Conference on Management of Data, 207-216.
  2. Ahn, K. and Kim, S. (2003). A new interstingness measure in association rules mining. Journal of the Korean Institute of Industrial Engineers, 29, 41-48.
  3. Cho, K. H. and Park, H. C. (2011a). Study on the multi intervening relation in association rules. Journal of the Korean Data Analysis Society, 13, 297-306.
  4. Cho, K. H. and Park, H. C. (2011b). A study on insignificant rules discovery in association rule mining. Journal of the Korean Data & Information Science Society, 22, 81-88.
  5. Freitas, A. (1999). On rule interestingness measures. Knowledge-based System, 12, 309-315. https://doi.org/10.1016/S0950-7051(99)00019-2
  6. Geng, L. and Hamilton, H. J. (2006). Interestingness measures for data mining: A survey. ACM Computing Surveys, 38, 1-32. https://doi.org/10.1145/1132952.1132953
  7. Han, J., Pei, J. and Yin, Y. (2000). Mining frequent patterns without candidate generation. Proceedings of ACM SIGMOD Conference on Management of Data, 1-12.
  8. Hilderman, R. J. and Hamilton, H. J. (1999). Knowledge discovery and interestingness measures: A survey, Technical Report CS 99-04, Department of Computer Science, University of Regina, 1-27.
  9. Jin, D. S., Kang, C., Kim, K. K. and Choi, S. B. (2011). CRM on travel agency using association rules. Journal of the Korean Data Analysis Society, 13, 2945-2952.
  10. Omiecinski, E. R. (2003). Alternative interest measures for mining associations in databases. IEEE Transactions on Knowledge and Data Engineering, 15, 57-69. https://doi.org/10.1109/TKDE.2003.1161582
  11. Park, H. C. (2011a). Association rule ranking function by decreased lift influence. Journal of the Korean Data & Information Science Society, 22, 179-188.
  12. Park, H. C. (2011b). The proposition of attributably pure confidence in association rule mining. Journal of the Korean Data & Information Science Society, 22, 235-243.
  13. Park, H. C. (2012a). Negatively attributable and pure confidence for generation of negative association rules. Journal of the Korean Data & Information Science Society, 23, 707-716. https://doi.org/10.7465/jkdi.2012.23.5.939
  14. Park, H. C. (2012b). Exploration of PIM based similarity measures as association rule thresholds. Journal of the Korean Data & Information Science Society, 23, 1127-1135. https://doi.org/10.7465/jkdi.2012.23.6.1127
  15. Park, H. C. (2013a). The proposition of compared and attributably pure confidence in association rule mining. Journal of the Korean Data & Information Science Society, 24, 523-532. https://doi.org/10.7465/jkdi.2013.24.3.523
  16. Park, H. C. (2013b). Proposition of causal association rule thresholds. Journal of the Korean Data & Information Science Society, 24, 1189-1197. https://doi.org/10.7465/jkdi.2013.24.6.1189
  17. Pei, J., Han, J. and Mao, R. (2000). CLOSET: An efficient algorithm for mining frequent closed itemsets. Proceedings of ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 21-30.
  18. Piatetsky-Shapiro, G. (1991). Discovery, analysis and presentation of strong rules. Knowledge Discovery in Databases, AAAI/MIT Press, 229-248.
  19. Silberschatz, A. and Tuzhilin, A. (1996). What makes patterns interesting in knowledge discovery systems. IEEE Transactions on Knowledge Data Engineering, 8, 970-974. https://doi.org/10.1109/69.553165
  20. Tan, P. N., Kumar, V. and Srivastava, J. (2002). Selecting the right interestingness measure for association patterns. Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 32-41.
  21. Wu, T., Chen, Y. and Han, J. (2010). Re-examination of interestingness measures in pattern mining: A unified framework. Data Mining and Knowledge Discovery, 21, 371-397. https://doi.org/10.1007/s10618-009-0161-2