Utilizing Purely Symmetric J Measure for Association Rules

연관성 규칙의 탐색을 위한 순수 대칭적 J 측도의 활용

  • Received : 2018.11.20
  • Accepted : 2018.12.20
  • Published : 2018.12.31

Abstract

In the field of data mining technique, there are various methods such as association rules, cluster analysis, decision tree, neural network. Among them, association rules are defined by using various association evaluation criteria such as support, confidence, and lift. Agrawal et al. (1993) first proposed this association rule, and since then research has been conducted by many scholars. Recently, studies related to crossover entropy have been published (Park, 2016b). In this paper, we proposed a purely symmetric J measure considering directionality and purity in the previously published J measure, and examined its usefulness by using examples. As a result, it is found that the pure symmetric J measure changes more clearly than the conventional J measure, the symmetric J measure, and the pure crossover entropy measure as the frequency of coincidence increases. The variation of the pure symmetric J measure was also larger depending on the magnitude of the inconsistency, and the presence or absence of the association was more clearly understood.

데이터 마이닝 분야에서 개발된 기법에는 연관성 규칙, 군집분석, 의사결정나무, 신경망 등 여러 가지가 있는데 이들 중에서 연관성 규칙은 지지도, 신뢰도, 그리고 향상도 등 여러 가지 연관성 평가 기준을 이용하여 항목들 간에 특정한 연관성을 탐색하는 기법이다(Park, 2014). 이러한 연관성 규칙은 Agrawal et al.(1993)이 처음 제안하였으며, 그 이후로 여러 연구자들에 의해 연구가 진행되고 있으며, 최근에는 교차 엔트로피와 관련된 연구들이 발표되고 있다(Park, 2016b). 본 논문에서는 기존에 발표된 J 측도에 방향성과 순수성을 고려한 순수 대칭적 J 측도를 제안하고 예제를 활용하여 그 유용성에 대해 알아보았다. 그 결과, 동시발생빈도가 증가함에 따라 순수 대칭적 J 측도가 기존의 J 측도와 대칭적 J 측도, 순수 교차 엔트로피 측도보다 훨씬 분명하게 변하는 것을 알 수 있었으며, 불일치빈도의 크기에 따라서도 순수 대칭적 J 측도가 변화하는 폭이 더 커짐에 따라 연관성 유무를 더 분명하게 파악할 수 있었다. 따라서 순수 대칭적 J 측도는 데이터가 존재하는 어느 분야에서든지 연관성 규칙의 평가에 적용이 가능할 것으로 생각된다.

Keywords

References

  1. Agrawal, R., Imielinski, R., Swami, A. (1993). Mining association rules between sets of items in large databases, Proceedings of the ACM SIGMOD Conference on Management of Data, 207-216.
  2. Chun, I. J., Eun, H. C. (2014). Association rule mining on viewing rate analysis : in case of drama genre of terrestrial broadcasters, Korean Journal of Journalism & Communication Studies, 58(5), 391-416. (in Korean).
  3. Park, H. C. (2013). A proposition of association rule thresholds considering relative occurrence/nonoccurrence, Journal of the Korean Data Analysis Society, 15(4), 1841-1850. (in Korean).
  4. Park, H. C. (2014). Comparison of confidence measures useful for classification model building, Journal of the Korean Data and Information Science Society, 25(2), 365-371. (in Korean). https://doi.org/10.7465/jkdi.2014.25.2.365
  5. Park, H. C. (2016a). Proposition of entropy based association thresholds, Journal of the Korean Data Analysis Society, 18(4), 1905-1914. (in Korean).
  6. Park, H. C. (2016b). Proposition of pure signed Hellinger measure as association rule threshold, Journal of the Korean Data Analysis Society, 18(5), 2477-2484. (in Korean).
  7. Park, H. C. (2017a). Alternative plan of elementary association threshold by symmetric J measure, Journal of the Korean Data Analysis Society, 19(4), 1887-1895. (in Korean).
  8. Park, H. C. (2017b). Proposition of adjusted balance Hellinger measure as interestingness measure, Journal of the Korean Data Analysis Society, 19(4), 1887-1895. (in Korean).
  9. Park, H. C. (2018). Proposition of pure cross entropy in association rule technique, Journal of the Korean Data Analysis Society, 20(2), 669-679. (in Korean).
  10. Smyth, P., Goodman, R. M. (1992). An information theoretic approach to rule induction from databases, IEEE Transactions on Knowledge and Data Engineering, 4(4), 301-316. https://doi.org/10.1109/69.149926