Subgraph Searching Scheme Based on Path Queries in Distributed Environments

분산 환경에서 경로 질의 기반 서브 그래프 탐색 기법

Kim, Minyoung;Choi, Dojin;Park, Jaeyeol;Kim, Yeondong;Lim, Jongtae;Bok, Kyoungsoo;Choi, Han Suk;Yoo, Jaesoo

  • Received : 2018.12.20
  • Accepted : 2018.01.16
  • Published : 2019.01.28


A network of graph data structure is used in many applications to represent interactions between entities. Recently, as the size of the network to be processed due to the development of the big data technology is getting larger, it becomes more difficult to handle it in one server, and thus the necessity of distributed processing is also increasing. In this paper, we propose a distributed processing system for efficiently performing subgraph and stores. To reduce unnecessary searches, we use statistical information of the data to determine the search order through probabilistic scoring. Since the relationship between the vertex and the degree of the graph network may show different characteristics depending on the type of data, the search order is determined by calculating a score to reduce unnecessary search through a different scoring method for a graph having various distribution characteristics. The graph is sequentially searched in the distributed servers according to the determined order. In order to demonstrate the superiority of the proposed method, performance comparison with the existing method was performed. As a result, the search time is improved by about 3 ~ 10% compared with the existing method.


Graph Data;Graph Search;Distributed Processing;Subgraph Matching;Bigdata


  1. A. Cuzzocrea, F. Furfaro, G. M. Mazzeo, and D. Sacca, "A grid framework for approximate aggregate query answering on summarized sensor network readings," Proc. OTM Workshops, pp.144-153, 2004.
  2. A. Fariha, C. F. Ahmed, C. K. Leung, S. M. Abdullah, and L. Cao, "Mining frequent patterns from human interactions in meetings using directed acyclic graphs," Proc. Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, pp.38-49, 2013.
  3. F. Jiang and C. K. Leung, "Mining interesting "following" patterns from social networks," Proc. International Conference on Data Warehousing and Knowledge Discovery, pp.308-319, 2014.
  4. F. Towards, "Towards a Scalable HDFS Architecture," Proc. International Conference on Collaboration Technologies and Systems, pp.155-161, 2013.
  5. J. Dorre, S. Apel, and C. Lengauer, "Modeling and optimizing MapReduce programs," Concurrency and Computation: Practice and Experience, Vol.27, No.7, pp.1734-1766, 2015.
  6. A. Alam and J. Ahmed, "Hadoop Architecture and Its Issues," Proc. International Conference on Computational Science and Computational Intelligence, pp.288-291, 2014.
  7. X. Liao, Z. Gao, W. Ji, and Y. Wang, "An enforcement of real time scheduling in Spark Streaming," Proc. International Green and Sustainable Computing Conference, pp.1-6, 2015.
  8. N. Talukder, and M. J. Zaki, "A distributed approach for graph mining in massive networks," Data Mining and Knowledge Discovery, Vol.30, No.5, pp.1024-1052, 2016.
  9. Y, Tian, R. C. McEachin, C. Santos, D. J. States, and J. M. Patel, "SAGA: a subgraph matching tool for biological graphs," Bioinformatics, Vol.23, No.2, pp.232-239, 2007.
  10. J. Cheng, Y. Ke, and W. Ng, "Efficient query processing on graph databases," ACM Transactions on Database Systems, Vol.34, No.1, pp.1-48, 2009.
  11. S. Khuller, B. Raghavachari, and N. E. Young, "Balancing minimum spanning trees and shortest-path trees," Algorithmica, Vol.14, No.4, pp.305-321, 1995.
  12. J. Balaji and R. Sunderraman, "Distributed Graph Path Queries Using Spark," Proc. COMPSAC Workshops, pp.326-331, 2016.
  13. X. Zhang and L. Chen, "Distance-aware selective online query processing over large distributed graphs," Data Science and Engineering, Vol.2, No.1, pp.2-21, 2017.
  14. N. Jing, Y. Huang, and E. A. Rundensteiner, "Hierarchical encoded path views for path query processing: An optimal model and its performance evaluation," IEEE Transactions on Knowledge and Data Engineering, Vol.10, No.3, pp.409-432, 1998.
  15. M. Faloutsos, P. Faloutsos, and C. Faloutsos, "On power-law relationships of the internet topology," ACM SIGCOMM 1999 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, pp.251-262, 1999.
  16. M. L. Goldstein, S. A. Morris, and G. G. Yen, "Problems with fitting to the power-law distribution," The European Physical Journal B-Condensed Matter and Complex Systems, Vol.41, No.2, pp.255-258, 2004.


Grant : 실시간 대규모 영상 데이터 이해.예측을 위한 고성능 비주얼 디스커버리 플랫폼 개발

Supported by : 한국연구재단, 정보통신기술진흥센터