• Title, Summary, Keyword: Apache Spark

Search Result 41, Processing Time 0.049 seconds

Distributed Moving Objects Management System for a Smart Black Box

  • Lee, Hyunbyung;Song, Seokil
    • International Journal of Contents
    • /
    • v.14 no.1
    • /
    • pp.28-33
    • /
    • 2018
  • In this paper, we design and implement a distributed, moving objects management system for processing locations and sensor data from smart black boxes. The proposed system is designed and implemented based on Apache Kafka, Apache Spark & Spark Streaming, Hbase, HDFS. Apache Kafka is used to collect the data from smart black boxes and queries from users. Received location data from smart black boxes and queries from users becomes input of Apache Spark Streaming. Apache Spark Streaming preprocesses the input data for indexing. Recent location data and indexes are stored in-memory managed by Apache Spark. Old data and indexes are flushed into HBase later. We perform experiments to show the throughput of the index manager. Finally, we describe the implementation detail in Scala function level.

Spark Framework Based on a Heterogenous Pipeline Computing with OpenCL (OpenCL을 활용한 이기종 파이프라인 컴퓨팅 기반 Spark 프레임워크)

  • Kim, Daehee;Park, Neungsoo
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.67 no.2
    • /
    • pp.270-276
    • /
    • 2018
  • Apache Spark is one of the high performance in-memory computing frameworks for big-data processing. Recently, to improve the performance, general-purpose computing on graphics processing unit(GPGPU) is adapted to Apache Spark framework. Previous Spark-GPGPU frameworks focus on overcoming the difficulty of an implementation resulting from the difference between the computation environment of GPGPU and Spark framework. In this paper, we propose a Spark framework based on a heterogenous pipeline computing with OpenCL to further improve the performance. The proposed framework overlaps the Java-to-Native memory copies of CPU with CPU-GPU communications(DMA) and GPU kernel computations to hide the CPU idle time. Also, CPU-GPU communication buffers are implemented with switching dual buffers, which reduce the mapped memory region resulting in decreasing memory mapping overhead. Experimental results showed that the proposed Spark framework based on a heterogenous pipeline computing with OpenCL had up to 2.13 times faster than the previous Spark framework using OpenCL.

Performance Evaluation Between PC and RaspberryPI Cluster in Apache Spark for Processing Big Data (빅데이터 처리를 위한 PC와 라즈베리파이 클러스터에서의 Apache Spark 성능 비교 평가)

  • Seo, Ji-Hye;Park, Mi-Rim;Yang, Hye-Kyung;Yong, Hwan-Seung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • /
    • pp.1265-1267
    • /
    • 2015
  • 최근 IoT 기술의 등장으로 저전력 소형 컴퓨터인 라즈베리파이 클러스터가 IoT 데이터 처리를 위해 사용되고 있다. IoT 기술이 발전하면서 다양한 데이터가 생성되고 있으며 IoT 환경에서도 빅데이터 처리가 요구되고 있다. 빅데이터 처리 프레임워크에는 일반적으로 하둡이 사용되고 있으며 이를 대체하는 솔루션으로 Apache Spark가 등장했다. 본 논문에서는 PC와 라즈베리파이 클러스터에서의 성능을 Apache Spark를 통해 비교하였다. 본 실험을 위해 Yelp 데이터를 사용하며 데이터 로드 시간과 Spark SQL을 이용한 데이터 처리 시간을 통해 성능을 비교하였다.

Big Data Astronomy : Let's "PySpark" the Universe (빅데이터 천문학 : PySpark를 이용한 천문자료 분석)

  • Hong, Sungryong
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.43 no.1
    • /
    • pp.63.1-63.1
    • /
    • 2018
  • The modern large-scale surveys and state-of-the-art cosmological simulations produce various kinds of big data composed of millions and billions of galaxies. Inevitably, we need to adopt modern Big Data platforms to properly handle such large-scale data sets. In my talk, I will briefly introduce the de facto standard of modern Big Data platform, Apache Spark, and present some examples to demonstrate how Apache Spark can be utilized for solving data-driven astronomical problems.

  • PDF

Framework Implementation of Image-Based Indoor Localization System Using Parallel Distributed Computing (병렬 분산 처리를 이용한 영상 기반 실내 위치인식 시스템의 프레임워크 구현)

  • Kwon, Beom;Jeon, Donghyun;Kim, Jongyoo;Kim, Junghwan;Kim, Doyoung;Song, Hyewon;Lee, Sanghoon
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.41 no.11
    • /
    • pp.1490-1501
    • /
    • 2016
  • In this paper, we propose an image-based indoor localization system using parallel distributed computing. In order to reduce computation time for indoor localization, an scale invariant feature transform (SIFT) algorithm is performed in parallel by using Apache Spark. Toward this goal, we propose a novel image processing interface of Apache Spark. The experimental results show that the speed of the proposed system is about 3.6 times better than that of the conventional system.

Using IoT and Apache Spark Analysis Technique to Monitoring Architecture Model for Fruit Harvest Region (IoT 기반 Apache Spark 분석기법을 이용한 과수 수확 불량 영역 모니터링 아키텍처 모델)

  • Oh, Jung Won;Kim, Hangkon
    • Smart Media Journal
    • /
    • v.6 no.4
    • /
    • pp.58-64
    • /
    • 2017
  • Modern society is characterized by rapid increase in world population, aging of the rural population, decrease of cultivation area due to industrialization. The food problem is becoming an important issue with the farmers and becomes rural. Recently, the researches about the field of the smart farm are actively carried out to increase the profit of the rural area. The existing smart farm researches mainly monitor the cultivation environment of the crops in the greenhouse, another way like in the case of poor quality t is being studied that the system to control cultivation environmental factors is automatically activated to keep the cultivation environment of crops in optimum conditions. The researches focus on the crops cultivated indoors, and there are not many studies applied to the cultivation environment of crops grown outside. In this paper, we propose a method to improve the harvestability of poor areas by monitoring the areas with bad harvests by using big data analysis, by precisely predicting the harvest timing of fruit trees growing in orchards. Factors besides for harvesting include fruit color information and fruit weight information We suggest that a harvest correlation factor data collected in real time. It is analyzed using the Apache Spark engine. The Apache Spark engine has excellent performance in real-time data analysis as well as high capacity batch data analysis. User device receiving service supports PC user and smartphone users. A sensing data receiving device purpose Arduino, because it requires only simple processing to receive a sensed data and transmit it to the server. It regulates a harvest time of fruit which produces a good quality fruit, it is needful to determine a poor harvest area or concentrate a bad area. In this paper, we also present an architectural model to determine the bad areas of fruit harvest using strong data analysis.

A performance comparison for Apache Spark platform on environment of limited memory (제한된 메모리 환경에서의 아파치 스파크 성능 비교)

  • Song, Jun-Seok;Kim, Sang-Young;Lee, Jung-June;Youn, Hee-Yong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • /
    • pp.67-68
    • /
    • 2016
  • 최근 빅 데이터를 이용한 시스템들이 여러 분야에서 활발히 이용되기 시작하면서 대표적인 빅데이터 저장 및 처리 플랫폼인 하둡(Hadoop)의 기술적 단점을 보완할 수 있는 다양한 분산 시스템 플랫폼이 등장하고 있다. 그 중 아파치 스파크(Apache Spark)는 하둡 플랫폼의 속도저하 단점을 보완하기 위해 인 메모리 처리를 지원하여 대용량 데이터를 효율적으로 처리하는 오픈 소스 분산 데이터 처리 플랫폼이다. 하지만, 아파치 스파크의 작업은 메모리에 의존적이므로 제한된 메모리 환경에서 전체 작업 성능은 급격히 낮아진다. 본 논문에서는 메모리 용량에 따른 아파치 스파크 성능 비교를 통해 아파치 스파크 동작을 위해 필요한 적정 메모리 용량을 확인한다.

  • PDF

Processing large-scale data with Apache Spark (Apache Spark를 활용한 대용량 데이터의 처리)

  • Ko, Seyoon;Won, Joong-Ho
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.6
    • /
    • pp.1077-1094
    • /
    • 2016
  • Apache Spark is a fast and general-purpose cluster computing package. It provides a new abstraction named resilient distributed dataset, which is capable of support for fault tolerance while keeping data in memory. This type of abstraction results in a significant speedup compared to legacy large-scale data framework, MapReduce. In particular, Spark framework is suitable for iterative machine learning applications such as logistic regression and K-means clustering, and interactive data querying. Spark also supports high level libraries for various applications such as machine learning, streaming data processing, database querying and graph data mining thanks to its versatility. In this work, we introduce the concept and programming model of Spark as well as show some implementations of simple statistical computing applications. We also review the machine learning package MLlib, and the R language interface SparkR.

An Implementation of Web-Enabled OLAP Server in Korean HealthCare BigData Platform (한국 보건의료 빅데이터 플랫폼에서 웹 기반 OLAP 서버 구현)

  • Ly, Pichponreay;Kim, jin-hyuk;Jung, seung-hyun;Lee, kyung-hee Lee;Cho, wan-sup
    • Proceedings of the Korea Contents Association Conference
    • /
    • /
    • pp.33-34
    • /
    • 2017
  • In 2015, Ministry of Health and Welfare of Korea announced a research and development plan of using Korean healthcare data to support decision making, reduce cost and enhance a better treatment. This project relies on the adoption of BigData technology such as Apache Hadoop, Apache Spark to store and process HealthCare Data from various institution. Here we present an approach a design and implementation of OLAP server in Korean HealthCare BigData platform. This approach is used to establish a basis for promoting personalized healthcare research for decision making, forecasting disease and developing customized diagnosis and treatment.

  • PDF