DOI QR코드

DOI QR Code

An Extended Frequent Pattern Tree for Hiding Sensitive Frequent Itemsets

민감한 빈발 항목집합 숨기기 위한 확장 빈발 패턴 트리

  • Received : 2011.02.21
  • Accepted : 2011.04.28
  • Published : 2011.06.30

Abstract

Recently, data sharing between enterprises or organizations is required matter for task cooperation. In this process, when the enterprise opens its database to the affiliates, it can be occurred to problem leaked sensitive information. To resolve this problem it is needed to hide sensitive information from the database. Previous research hiding sensitive information applied different heuristic algorithms to maintain quality of the database. But there have been few studies analyzing the effects on the items modified during the hiding process and trying to minimize the hided items. This paper suggests eFP-Tree(Extended Frequent Pattern Tree) based FP-Tree(Frequent Pattern Tree) to hide sensitive frequent itemsets. Node formation of eFP-Tree uses border to minimize impacts of non sensitive frequent itemsets in hiding process, by organizing all transaction, sensitive and border information differently to before. As a result to apply eFP-Tree to the example transaction database, the lost items were less than 10%, proving it is more effective than the existing algorithm and maintain the quality of database to the optimal.

최근 기업 간 또는 기관 사이의 데이터 공유는 업무 협력을 위해서 필요한 사안이 되고 있다. 이 과정에서 기업이 데이터베이스를 계열회사에 공개했을 때 민감한 정보가 유출되는 문제점이 발행할 수도 있다. 이런 문제를 해결하기 위해서 민감한 정보를 데이터베이스로부터 숨기는 일이 필요하게 되었다. 민감한 정보를 숨기는 이전 연구들은 결과 데이터베이스의 품질을 유지하기 위해 다른 휴리스틱 알고리즘을 적용했다. 그러나 민감한 정보를 숨기는 과정에서 변경되는 항목집합에 대한 영향을 평가하거나 숨겨지는 항목을 최소화하는 연구들은 미흡하였다. 본 논문에서는 민감한 빈발 항목집합을 숨기기 위하여 FP-Tree(Frequent Pattern Tree)기반의 확장 빈발 패턴트리(Extended Frequent Pattern Tree, eFP-Tree)를 제안한다. eFP-Tree의 노드 구성은 기존과는 다르게 빈발 항목집합 생성단계에서 트랜잭션 정보와 민감 정보, 경계 정보를 모두 구성하며, 숨기는 과정에서 비민감한 빈발 항목집합의 영향을 최소화하기 위하여 경계를 사용하였다. 본 논문의 예시 트랜잭션 데이터베이스에 eFP-Tree를 적용한 결과, 손실 항목을 평균 10%이하로 최소화하여 기존 방법들에 비해 효과적임을 증명하였고, 데이터베이스의 품질을 최적으로 유지할 수가 있었다.

Keywords

References

  1. C. Clifton and D. Marks, "Security and privacy implications of data mining," Data Mining and Knowledge Discovery, Proc. of the ACM workshop Research Issues in Data Mining and Knowledge Discovery, pp.15-19, 1996.
  2. J. Han, J. Pei, Y. Yin, R. Mao, "Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach," Data Mining and Knowledge Discovery, Vol.8, pp.53-87, 2004. https://doi.org/10.1023/B:DAMI.0000005258.31418.83
  3. M. Atallah, E. Bertino, A. Elmagarmid, M. Ibrahim and V. Verykos, "Disclosure limitation of sensitive rules," Proc. of the IEEE workshop Knowledge and Data Eng. Exchange, pp.45-52, 1999. https://doi.org/10.1109/KDEX.1999.836532
  4. E. Dasseni, V. S. Verykios, A. K. Elmagarmid and E. Bertino, "Hiding association rules by using confidence and support," Proc. of the 4th Information Hiding Workshop, pp.369-383, 2001. https://doi.org/10.1007/3-540-45496-9_27
  5. Y. Saygin, V. S. Verykios, and A. K. Elmagarmid, "Privacy preserving association rule mining," Proc. of the IEEE workshop Research Issues in Data Eng., 2002. https://doi.org/10.1109/RIDE.2002.995109
  6. S. Oliveira, O. Zaiane and Y. Saygin, "Secure association rule sharing," Proc. of the 8th Pacific-Asia Conference Knowledge Discovery and Data Mining, pp.74-85, 2004.
  7. H. Mannila and H. Toivonen, "Levelwise search and borders of theories in knowledge discovery," Data Mining and Knowledge Discovery, Vol.1, No.3, pp.241-258, 1997. https://doi.org/10.1023/A:1009796218281
  8. R. Agrawal and R. Srikant, "Mining Sequential Patterns," Proc. of the 11th International Conference on Data Engineering (ICDE'95), pp.3-14, 1995. https://doi.org/10.1109/ICDE.1995.380415
  9. R. Agrawal and R. Srikant, "Fast algorithms for mining association rules in large databases," Proc. of the 20th International Conference on Very Large Data Bases, pp.487-499, 1994.
  10. V. S. Verykios, A. K. Elmagarmid, E. Bertino, Y. Saygin, and E. Dasseni, "Association rule hiding," IEEE Trans. Knowledge and Data Eng., Vol.16, No.4, pp.434-447, 2004. https://doi.org/10.1109/TKDE.2004.1269668
  11. Shyue-Liang Wang, "Hiding sensitive predictive association rules," Systems, Man and Cybernetics, 2005 IEEE International Conference on Information Reuse and Integration, Vol.1, pp.164-169, 2005. https://doi.org/10.1109/ICSMC.2005.1571139
  12. Yi-Hung Wu, Chai-Ming Chiang and Arbee L. P. Chen, "Hiding Sensitive Association Rules with Limited Side Effects," IEEE Transactions on Knowledge and Data Engineering, Vol.19, Issue 1, pp.29-42, 2007. https://doi.org/10.1109/TKDE.2007.250583
  13. http://www.almaden.ibm.com/cs/projects/iis/hdb/Projects/data_mining/mining.shtml