DOI QR코드

DOI QR Code

Subgraph Searching Scheme Based on Path Queries in Distributed Environments

분산 환경에서 경로 질의 기반 서브 그래프 탐색 기법

  • 김민영 (충북대학교 정보통신공학과) ;
  • 최도진 (충북대학교 정보통신공학과) ;
  • 박재열 (충북대학교 정보통신공학과) ;
  • 김연동 (충북대학교 정보통신공학과) ;
  • 임종태 (충북대학교 정보통신공학과) ;
  • 복경수 (충북대학교 정보통신공학과) ;
  • 최한석 (목포대학교 컴퓨터공학과) ;
  • 유재수 (충북대학교 정보통신공학과)
  • Received : 2018.12.20
  • Accepted : 2018.01.16
  • Published : 2019.01.28

Abstract

A network of graph data structure is used in many applications to represent interactions between entities. Recently, as the size of the network to be processed due to the development of the big data technology is getting larger, it becomes more difficult to handle it in one server, and thus the necessity of distributed processing is also increasing. In this paper, we propose a distributed processing system for efficiently performing subgraph and stores. To reduce unnecessary searches, we use statistical information of the data to determine the search order through probabilistic scoring. Since the relationship between the vertex and the degree of the graph network may show different characteristics depending on the type of data, the search order is determined by calculating a score to reduce unnecessary search through a different scoring method for a graph having various distribution characteristics. The graph is sequentially searched in the distributed servers according to the determined order. In order to demonstrate the superiority of the proposed method, performance comparison with the existing method was performed. As a result, the search time is improved by about 3 ~ 10% compared with the existing method.

CCTHCV_2019_v19n1_141_f0001.png 이미지

그림 1. 제안하는 기법의 전체 구조

CCTHCV_2019_v19n1_141_f0002.png 이미지

그림 2. 차수와 레이블의 차이로 인한 필터링

CCTHCV_2019_v19n1_141_f0003.png 이미지

그림 3. 정규분포의 필터링 확률

CCTHCV_2019_v19n1_141_f0004.png 이미지

그림 4. power-law 분포의 필터링 확률

CCTHCV_2019_v19n1_141_f0005.png 이미지

그림 5. 질의 탐색 순서와 그 결과

CCTHCV_2019_v19n1_141_f0006.png 이미지

그림 6. 스코어링 방법에 따른 탐색 시간

CCTHCV_2019_v19n1_141_f0007.png 이미지

그림 7. 구조에 따른 질의 유형

CCTHCV_2019_v19n1_141_f0008.png 이미지

그림 8. 임의 생성 데이터에서 질의에 따른 탐색 시간

CCTHCV_2019_v19n1_141_f0009.png 이미지

그림 9. 실제 데이터에서 질의에 따른 탐색 시간

표 1. 성능평가 환경

CCTHCV_2019_v19n1_141_t0001.png 이미지

Acknowledgement

Grant : 실시간 대규모 영상 데이터 이해.예측을 위한 고성능 비주얼 디스커버리 플랫폼 개발

Supported by : 한국연구재단, 정보통신기술진흥센터

References

  1. A. Cuzzocrea, F. Furfaro, G. M. Mazzeo, and D. Sacca, "A grid framework for approximate aggregate query answering on summarized sensor network readings," Proc. OTM Workshops, pp.144-153, 2004.
  2. A. Fariha, C. F. Ahmed, C. K. Leung, S. M. Abdullah, and L. Cao, "Mining frequent patterns from human interactions in meetings using directed acyclic graphs," Proc. Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, pp.38-49, 2013.
  3. F. Jiang and C. K. Leung, "Mining interesting "following" patterns from social networks," Proc. International Conference on Data Warehousing and Knowledge Discovery, pp.308-319, 2014.
  4. F. Towards, "Towards a Scalable HDFS Architecture," Proc. International Conference on Collaboration Technologies and Systems, pp.155-161, 2013.
  5. J. Dorre, S. Apel, and C. Lengauer, "Modeling and optimizing MapReduce programs," Concurrency and Computation: Practice and Experience, Vol.27, No.7, pp.1734-1766, 2015. https://doi.org/10.1002/cpe.3333
  6. A. Alam and J. Ahmed, "Hadoop Architecture and Its Issues," Proc. International Conference on Computational Science and Computational Intelligence, pp.288-291, 2014.
  7. X. Liao, Z. Gao, W. Ji, and Y. Wang, "An enforcement of real time scheduling in Spark Streaming," Proc. International Green and Sustainable Computing Conference, pp.1-6, 2015.
  8. N. Talukder, and M. J. Zaki, "A distributed approach for graph mining in massive networks," Data Mining and Knowledge Discovery, Vol.30, No.5, pp.1024-1052, 2016. https://doi.org/10.1007/s10618-016-0466-x
  9. Y, Tian, R. C. McEachin, C. Santos, D. J. States, and J. M. Patel, "SAGA: a subgraph matching tool for biological graphs," Bioinformatics, Vol.23, No.2, pp.232-239, 2007. https://doi.org/10.1093/bioinformatics/btl571
  10. J. Cheng, Y. Ke, and W. Ng, "Efficient query processing on graph databases," ACM Transactions on Database Systems, Vol.34, No.1, pp.1-48, 2009.
  11. S. Khuller, B. Raghavachari, and N. E. Young, "Balancing minimum spanning trees and shortest-path trees," Algorithmica, Vol.14, No.4, pp.305-321, 1995. https://doi.org/10.1007/BF01294129
  12. J. Balaji and R. Sunderraman, "Distributed Graph Path Queries Using Spark," Proc. COMPSAC Workshops, pp.326-331, 2016.
  13. X. Zhang and L. Chen, "Distance-aware selective online query processing over large distributed graphs," Data Science and Engineering, Vol.2, No.1, pp.2-21, 2017. https://doi.org/10.1007/s41019-016-0023-z
  14. N. Jing, Y. Huang, and E. A. Rundensteiner, "Hierarchical encoded path views for path query processing: An optimal model and its performance evaluation," IEEE Transactions on Knowledge and Data Engineering, Vol.10, No.3, pp.409-432, 1998. https://doi.org/10.1109/69.687976
  15. M. Faloutsos, P. Faloutsos, and C. Faloutsos, "On power-law relationships of the internet topology," ACM SIGCOMM 1999 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, pp.251-262, 1999.
  16. M. L. Goldstein, S. A. Morris, and G. G. Yen, "Problems with fitting to the power-law distribution," The European Physical Journal B-Condensed Matter and Complex Systems, Vol.41, No.2, pp.255-258, 2004. https://doi.org/10.1140/epjb/e2004-00316-5