• Title/Summary/Keyword: distributed parallel inference engine

Search Result 6, Processing Time 0.02 seconds

Scalable RDFS Reasoning Using the Graph Structure of In-Memory based Parallel Computing (인메모리 기반 병렬 컴퓨팅 그래프 구조를 이용한 대용량 RDFS 추론)

  • Jeon, MyungJoong;So, ChiSeoung;Jagvaral, Batselem;Kim, KangPil;Kim, Jin;Hong, JinYoung;Park, YoungTack
    • Journal of KIISE
    • /
    • v.42 no.8
    • /
    • pp.998-1009
    • /
    • 2015
  • In recent years, there has been a growing interest in RDFS Inference to build a rich knowledge base. However, it is difficult to improve the inference performance with large data by using a single machine. Therefore, researchers are investigating the development of a RDFS inference engine for a distributed computing environment. However, the existing inference engines cannot process data in real-time, are difficult to implement, and are vulnerable to repetitive tasks. In order to overcome these problems, we propose a method to construct an in-memory distributed inference engine that uses a parallel graph structure. In general, the ontology based on a triple structure possesses a graph structure. Thus, it is intuitive to design a graph structure-based inference engine. Moreover, the RDFS inference rule can be implemented by utilizing the operator of the graph structure, and we can thus design the inference engine according to the graph structure, and not the structure of the data table. In this study, we evaluate the proposed inference engine by using the LUBM1000 and LUBM3000 data to test the speed of the inference. The results of our experiment indicate that the proposed in-memory distributed inference engine achieved a performance of about 10 times faster than an in-storage inference engine.

An elastic distributed parallel Hadoop system for bigdata platform and distributed inference engines (동적 분산병렬 하둡시스템 및 분산추론기에 응용한 서버가상화 빅데이터 플랫폼)

  • Song, Dong Ho;Shin, Ji Ae;In, Yean Jin;Lee, Wan Gon;Lee, Kang Se
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.5
    • /
    • pp.1129-1139
    • /
    • 2015
  • Inference process generates additional triples from knowledge represented in RDF triples of semantic web technology. Tens of million of triples as an initial big data and the additionally inferred triples become a knowledge base for applications such as QA(question&answer) system. The inference engine requires more computing resources to process the triples generated while inferencing. The additional computing resources supplied by underlying resource pool in cloud computing can shorten the execution time. This paper addresses an algorithm to allocate the number of computing nodes "elastically" at runtime on Hadoop, depending on the size of knowledge data fed. The model proposed in this paper is composed of the layered architecture: the top layer for applications, the middle layer for distributed parallel inference engine to process the triples, and lower layer for elastic Hadoop and server visualization. System algorithms and test data are analyzed and discussed in this paper. The model hast the benefit that rich legacy Hadoop applications can be run faster on this system without any modification.

Scalable Ontology Reasoning Using GPU Cluster Approach (GPU 클러스터 기반 대용량 온톨로지 추론)

  • Hong, JinYung;Jeon, MyungJoong;Park, YoungTack
    • Journal of KIISE
    • /
    • v.43 no.1
    • /
    • pp.61-70
    • /
    • 2016
  • In recent years, there has been a need for techniques for large-scale ontology inference in order to infer new knowledge from existing knowledge at a high speed, and for a diversity of semantic services. With the recent advances in distributed computing, developments of ontology inference engines have mostly been studied based on Hadoop or Spark frameworks on large clusters. Parallel programming techniques using GPGPU, which utilizes many cores when compared with CPU, is also used for ontology inference. In this paper, by combining the advantages of both techniques, we propose a new method for reasoning large RDFS ontology data using a Spark in-memory framework and inferencing distributed data at a high speed using GPGPU. Using GPGPU, ontology reasoning over high-capacity data can be performed as a low cost with higher efficiency over conventional inference methods. In addition, we show that GPGPU can reduce the data workload on each node through the Spark cluster. In order to evaluate our approach, we used LUBM ranging from 10 to 120. Our experimental results showed that our proposed reasoning engine performs 7 times faster than a conventional approach which uses a Spark in-memory inference engine.

A Study on Distributed Parallel SWRL Inference in an In-Memory-Based Cluster Environment (인메모리 기반의 클러스터 환경에서 분산 병렬 SWRL 추론에 대한 연구)

  • Lee, Wan-Gon;Bae, Seok-Hyun;Park, Young-Tack
    • Journal of KIISE
    • /
    • v.45 no.3
    • /
    • pp.224-233
    • /
    • 2018
  • Recently, there are many of studies on SWRL reasoning engine based on user-defined rules in a distributed environment using a large-scale ontology. Unlike the schema based axiom rules, efficient inference orders cannot be defined in SWRL rules. There is also a large volumet of network shuffled data produced by unnecessary iterative processes. To solve these problems, in this study, we propose a method that uses Map-Reduce algorithm and distributed in-memory framework to deduce multiple rules simultaneously and minimizes the volume data shuffling occurring between distributed machines in the cluster. For the experiment, we use WiseKB ontology composed of 200 million triples and 36 user-defined rules. We found that the proposed reasoner makes inferences in 16 minutes and is 2.7 times faster than previous reasoning systems that used LUBM benchmark dataset.

Distributed In-Memory based Large Scale RDFS Reasoning and Query Processing Engine for the Population of Temporal/Spatial Information of Media Ontology (미디어 온톨로지의 시공간 정보 확장을 위한 분산 인메모리 기반의 대용량 RDFS 추론 및 질의 처리 엔진)

  • Lee, Wan-Gon;Lee, Nam-Gee;Jeon, MyungJoong;Park, Young-Tack
    • Journal of KIISE
    • /
    • v.43 no.9
    • /
    • pp.963-973
    • /
    • 2016
  • Providing a semantic knowledge system using media ontologies requires not only conventional axiom reasoning but also knowledge extension based on various types of reasoning. In particular, spatio-temporal information can be used in a variety of artificial intelligence applications and the importance of spatio-temporal reasoning and expression is continuously increasing. In this paper, we append the LOD data related to the public address system to large-scale media ontologies in order to utilize spatial inference in reasoning. We propose an RDFS/Spatial inference system by utilizing distributed memory-based framework for reasoning about large-scale ontologies annotated with spatial information. In addition, we describe a distributed spatio-temporal SPARQL parallel query processing method designed for large scale ontology data annotated with spatio-temporal information. In order to evaluate the performance of our system, we conducted experiments using LUBM and BSBM data sets for ontology reasoning and query processing benchmark.

Distributed Assumption-Based Truth Maintenance System for Scalable Reasoning (대용량 추론을 위한 분산환경에서의 가정기반진리관리시스템)

  • Jagvaral, Batselem;Park, Young-Tack
    • Journal of KIISE
    • /
    • v.43 no.10
    • /
    • pp.1115-1123
    • /
    • 2016
  • Assumption-based truth maintenance system (ATMS) is a tool that maintains the reasoning process of inference engine. It also supports non-monotonic reasoning based on dependency-directed backtracking. Bookkeeping all the reasoning processes allows it to quickly check and retract beliefs and efficiently provide solutions for problems with large search space. However, the amount of data has been exponentially grown recently, making it impossible to use a single machine for solving large-scale problems. The maintaining process for solving such problems can lead to high computation cost due to large memory overhead. To overcome this drawback, this paper presents an approach towards incrementally maintaining the reasoning process of inference engine on cluster using Spark. It maintains data dependencies such as assumption, label, environment and justification on a cluster of machines in parallel and efficiently updates changes in a large amount of inferred datasets. We deployed the proposed ATMS on a cluster with 5 machines, conducted OWL/RDFS reasoning over University benchmark data (LUBM) and evaluated our system in terms of its performance and functionalities such as assertion, explanation and retraction. In our experiments, the proposed system performed the operations in a reasonably short period of time for over 80GB inferred LUBM2000 dataset.