DOI QR코드

DOI QR Code

Proposition of balanced comparative confidence considering all available diagnostic tools

모든 가능한 진단도구를 활용한 균형비교신뢰도의 제안

  • Received : 2015.04.13
  • Accepted : 2015.05.18
  • Published : 2015.05.31

Abstract

By Wikipedia, big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. Data mining is the computational process of discovering patterns in huge data sets involving methods at the intersection of association rule, decision tree, clustering, artificial intelligence, machine learning. Association rule is a well researched method for discovering interesting relationships between itemsets in huge databases and has been applied in various fields. There are positive, negative, and inverse association rules according to the direction of association. If you want to set the evaluation criteria of association rule, it may be desirable to consider three types of association rules at the same time. To this end, we proposed a balanced comparative confidence considering sensitivity, specificity, false positive, and false negative, checked the conditions for association threshold by Piatetsky-Shapiro, and compared it with comparative confidence and inversely comparative confidence through a few experiments.

오늘날 정보 기술과 소셜미디어의 확산으로 인하여 빅 데이터에 관심이 집중되고 있다. 이를 처리하기 위한 기술 중의 하나가 데이터마이닝기법인데, 이들 중에는 연관성 규칙이 많이 활용되고 있다. 연관성 규칙은 방향에 따라 양, 음, 그리고 역의 연관성 규칙 등이 존재하며, 평가 기준을 설정하고자 하는 경우에는 이들 세 가지 연관성 규칙을 동시에 고려하는 것이 바람직하다고 할 수 있다. 이를 위해 본 논문에서는 의학진단분야에서 활용되고 있는 진단도구들 중에서 민감도, 특이도, 위양성도, 그리고 위음성도를 고려한 균형비교신뢰도를 제안하고자 한다. 또한 흥미도 측도가 가져야 할 조건들을 점검한 후, 예제를 통하여 측도의 유용성을 고찰하였다. 그 결과, 균형비교신뢰도는 비교신뢰도와 역의 비교신뢰도가 양의 값을 가지는 경우에는 양의 값을 가지며, 이들 두 값이 음인 경우에는 음으로 나타났다. 따라서 연관성 규칙의 평가 기준 관점에서 볼 때 비교신뢰도와 역의 비교신뢰도를 개별적으로 이용하기 보다는 균형비교신뢰도를 활용하는 것이 더 바람직하다고 할 수 있다.

Keywords

References

  1. Ahn, K. and Kim, S. (2003). A new interestingness measure in association rules mining. Journal of the Korean Institute of Industrial Engineers, 29, 41-48.
  2. Berzal, F., Blanco, I., Sanchez, D. and Vila, M. (2001). A new framework to assess association rules. Proceedings of the 4th International Conference on Intelligent Data Analysis, 95-104.
  3. Hilderman, R. J. and Hamilton, H. J. (2000). Applying objective interestingness measures in data mining systems. Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, 432-439.
  4. Hwang, J. and Kim, J. (2003). Target marketing using inverse association rule. Journal of Intelligence and Information Systems, 9, 195-209.
  5. Jin, D. S., Kang, C., Kim, K. K. and Choi, S. B. (2011). CRM on travel agency using association rules. Journal of the Korean Data Analysis Society, 13, 2945-2952.
  6. Kim, T. (2002). Estimation of defect rate from the screening test - The case of unknown sensitivity and specificity. Journal of the Korean Society for Quality Management, 30, 144-151.
  7. Kuo, Y. T. (2009) Mining surprising patterns, The doctoral paper of Melbourne university, Australia.
  8. Lavrac, N., Flach, P. and Zupan, B. (1999). Rule evaluation measures: a unifying view. Proceedings of the 9th International Workshop on Inductive Logic Programming, 174-185.
  9. Liu, B., Hsu, W., Chen, S. and Ma, Y. (2000). Analyzing the subjective interestingness of association rules. IEEE Intelligent Systems, 15, 47-55. https://doi.org/10.1109/5254.889106
  10. McNicholas, P.D., Murphy, T.B. and O'Regan, O. (2008). Standardising the lift of an association rule. Computational Statistics and Data Analysis, 52, 4712-4721. https://doi.org/10.1016/j.csda.2008.03.013
  11. Park, H. C. (2011a). The proposition of attributably pure confidence in association rule mining. Journal of the Korean Data & Information Science Society, 22, 235-243.
  12. Park, H. C. (2011b). Proposition of symmetrically pure confidence in association rule discovery. Journal of the Korean Data Analysis Society, 13, 879-890.
  13. Park, H. C. (2012). Exploration of symmetric similarity measures by conditional probabilities as association rule thresholds. Journal of the Korean Data Analysis Society, 14, 707-716.
  14. Park, H. C. (2013a). The proposition of c ompared and a ttributably pure confidence in association rule mining. Journal of the Korean Data & Information Science Society, 24, 523-532. https://doi.org/10.7465/jkdi.2013.24.3.523
  15. Park, H. C. (2013b). A proposition of association rule thresholds considering relative occurrence/ nonoccurrence. Journal of the Korean Data Analysis Society, 15, 1841-1850.
  16. Park, H. C. (2014a). Comparison of confidence measures useful for classification model building. Journal of the Korean Data & Information Science Society, 25, 365-371. https://doi.org/10.7465/jkdi.2014.25.2.365
  17. Park, H. C. (2014b). Proposition of causally confirmed measures in association rule mining. Journal of the Korean Data & Information Science Society, 25, 857-868. https://doi.org/10.7465/jkdi.2014.25.4.857
  18. Park, H. C. (2014c). Development of association rule threshold by balancing of relative rule accuracy. Journal of the Korean Data & Information Science Society, 25, 1345-1352. https://doi.org/10.7465/jkdi.2014.25.6.1345
  19. Piatetsky-Shapiro, G. (1991). Knowledge discovery in databases, MIT Press, Cambridge.

Cited by

  1. 흥미도 측도 관점에서 상대적 인과 강도의 고찰 vol.28, pp.1, 2015, https://doi.org/10.7465/jkdi.2017.28.1.49