• Title/Summary/Keyword: deduplication

Search Result 73, Processing Time 0.028 seconds

Analysis and Elimination of Side Channels during Duplicate Identification in Remote Data Outsourcing (원격 저장소 데이터 아웃소싱에서 발생하는 중복 식별 과정에서의 부채널 분석 및 제거)

  • Koo, Dongyoung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.27 no.4
    • /
    • pp.981-987
    • /
    • 2017
  • Proliferation of cloud computing services brings about reduction of the maintenance and management costs by allowing data to be outsourced to a dedicated third-party remote storage. At the same time, the majority of storage service providers have adopted a data deduplication technique for efficient utilization of storage resources. When a hash tree is employed for duplicate identification as part of deduplication process, size information of the attested data and partial information about the tree can be deduced from eavesdropping. To mitigate such side channels, in this paper, a new duplicate identification method is presented by exploiting a multi-set hash function.

Privacy Preserving source Based Deuplication Method (프라이버시 보존형 소스기반 중복제거 기술 방법 제안)

  • Nam, Seung-Soo;Seo, Chang-Ho;Lee, Joo-Young;Kim, Jong-Hyun;Kim, Ik-Kyun
    • Smart Media Journal
    • /
    • v.4 no.4
    • /
    • pp.33-38
    • /
    • 2015
  • Cloud storage server do not detect duplication of conventionally encrypted data. To solve this problem, Convergent Encryption has been proposed. Recently, various client-side deduplication technology has been proposed. However, this propositions still cannot solve the security problem. In this paper, we suggest a secure source-based deduplication technology, which encrypt data to ensure the confidentiality of sensitive data and apply proofs of ownership protocol to control access to the data, from curious cloud server and malicious user.

Privacy Preserving Source Based Deduplicaton Method (프라이버시 보존형 소스기반 중복제거 방법)

  • Nam, Seung-Soo;Seo, Chang-Ho
    • Journal of Digital Convergence
    • /
    • v.14 no.2
    • /
    • pp.175-181
    • /
    • 2016
  • Cloud storage servers do not detect duplication of conventionally encrypted data. To solve this problem, convergent encryption has been proposed. Recently, various client-side deduplication technology has been proposed. However, this propositions still cannot solve the security problem. In this paper, we suggest a secure source-based deduplication technology, which encrypt data to ensure the confidentiality of sensitive data and apply proofs of ownership protocol to control access to the data, from curious cloud server and malicious user.

Design Deduplication User File System for Flash-SSD (Flash-SSD 데이터 중복 제거를 위한 사용자 파일 시스템 설계)

  • Myeong, Jae-hui;Kwon, Oh-young
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.10a
    • /
    • pp.322-325
    • /
    • 2017
  • Due to the rapid increase in data, various studies are being conducted to efficiently manage the data. In 2025, the total amount of data will increase to more than 163 ZB, and more than a quarter of the data will be a real-time data. As mass storage devices is changed from HDD to SSD, SSD needs own way to manage their data effectively. In this paper, we study the SSD system structure and deduplication management methods of data management related to Flash-SSD. We also propose an application level user file system using deduplication. It is anticipated that it saves storage capacity and minimize reducing performance by unnecessary traffic.

  • PDF

Secure and Efficient Storage of Video Data in a CCTV Environment

  • Kim, Won-Bin;Lee, Im-Yeong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.6
    • /
    • pp.3238-3257
    • /
    • 2019
  • Closed-circuit television (CCTV) technology continuously captures and stores video streams. Users are typically required by policy to store all the captured video for a certain period. Accordingly, increasing the number of CCTV operation cycles and photographing positions expands the amount of data to be stored. However, expanding the available storage space for video data incurs increased costs. In recent years, this problem has been addressed with cloud storage solutions, which enable multiple users and devices to access and store data simultaneously. However, because of the large amount of data to be stored, a vast storage space is required. Consequently, cloud storage administrators need a way to store data more efficiently. To save storage space, deduplication technology has been proposed to prevent duplicate storage of the same data. However, because cloud storage is hosted on remote servers, data encryption technology must be applied to address data exposure issues. Although deduplication techniques for encrypted data have been studied, there have been various security vulnerabilities. We attempted to solve this problem by addressing various issues such as poison attacks, property forgery, and ownership management while removing the redundant data and handling the data more securely.

Block Separation Technique for Offline Deduplication on Solid State Drives (SSD에서 오프라인 중복 데이터 제거를 위한 플래시 메모리 블록 구분 기법)

  • Kang, Yun-Ji;An, Jeong-Choel;Shin, Dong-Kun
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06a
    • /
    • pp.379-381
    • /
    • 2012
  • 중복 제거(deduplication)기법은 저장장치의 공간을 효율적으로 사용할 수 있도록 해 주기 때문에 기존의 스토리지 시스템에서 많이 사용된 기법이다. 최근에는 플래시 메모리 기반의 SSD를 위한 중복 제거 기법도 많이 제안되었지만, 플래시 메모리의 특성을 고려하지 못하고 있다. 본 논문에서는 오프라인 중복 제거 기법을 대상으로 SSD의 특성을 고려하여 가비지 컬렉션의 비용을 절감할 수 있도록 중복 가능성이 있는 데이터와 중복 가능성이 없는 데이터를 온라인에 구분하여 플래시 메모리의 다른 영역에 기록하여 오프라인 중복 제거 후에 가비지 컬렉션 성능을 향상시키는 기법을 제안하였다. 실험결과, 제시된 기법은 가비지 컬렉션 비용인 페이지 이동 횟수를 약 80%이상 감소시켰다.

Systematic Review of Bug Report Processing Techniques to Improve Software Management Performance

  • Lee, Dong-Gun;Seo, Yeong-Seok
    • Journal of Information Processing Systems
    • /
    • v.15 no.4
    • /
    • pp.967-985
    • /
    • 2019
  • Bug report processing is a key element of bug fixing in modern software maintenance. Bug reports are not processed immediately after submission and involve several processes such as bug report deduplication and bug report triage before bug fixing is initiated; however, this method of bug fixing is very inefficient because all these processes are performed manually. Software engineers have persistently highlighted the need to automate these processes, and as a result, many automation techniques have been proposed for bug report processing; however, the accuracy of the existing methods is not satisfactory. Therefore, this study focuses on surveying to improve the accuracy of existing techniques for bug report processing. Reviews of each method proposed in this study consist of a description, used techniques, experiments, and comparison results. The results of this study indicate that research in the field of bug deduplication still lacks and therefore requires numerous studies that integrate clustering and natural language processing. This study further indicates that although all studies in the field of triage are based on machine learning, results of studies on deep learning are still insufficient.

Data Deduplication Method using Locality-based Chunking policy for SSD-based Server Storages (SSD 기반 서버급 스토리지를 위한 지역성 기반 청킹 정책을 이용한 데이터 중복 제거 기법)

  • Lee, Seung-Kyu;Kim, Ju-Kyeong;Kim, Deok-Hwan
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.2
    • /
    • pp.143-151
    • /
    • 2013
  • NAND flash-based SSDs (Solid State Drive) have advantages of fast input/output performance and low power consumption so that they could be widely used as storages on tablet, desktop PC, smart-phone, and server. But, SSD has the disadvantage of wear-leveling due to increase of the number of writes. In order to improve the lifespan of the SSD, a variety of data deduplication techniques have been introduced. General fixed-size splitting method allocates fixed size of chunk without considering locality of data so that it may execute unnecessary chunking and hash key generation, and variable-size splitting method occurs excessive operation since it compares data byte-by-byte for deduplication. This paper proposes adaptive chunking method based on application locality and file name locality of written data in SSD-based server storage. The proposed method split data into 4KB or 64KB chunks adaptively according to application locality and file name locality of duplicated data so that it can reduce the overhead of chunking and hash key generation and prevent duplicated data writing. The experimental results show that the proposed method can enhance write performance, reduce power consumption and operation time compared to existing variable-size splitting method and fixed size splitting method using 4KB.

SSD Assisted Recovery Efficiency Optimization System Based on Deduplication Method in the Cloud (클라우드 환경에서 중복 제거 기법을 적용한 SSD 기반의 회복 효율성 최적화 시스템 설계)

  • Kim, Min-Jae;Kim, Kyung-Tae;Youn, Hee-Young
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2014.07a
    • /
    • pp.223-226
    • /
    • 2014
  • 클라우드 컴퓨팅 및 모바일 통신 서비스의 사용량이 급격히 증가함에 따라 데이터가 기하급수적으로 증가하고 있다. 이러한 데이터를 저장하는 스토리지 장치로서 소비 전력이 작으며 우수한 데이터 접근 성능을 보이는 SSD(Solid State Disk)가 각광받고 있다. SSD는 다수의 NAND 플래시 메모리를 부착하고 호스트에서 요구하는 명령을 받아 수행하는 대용량 장치이다. 이러한 SSD는 비휘발성, 빠른 성능, 내구성, 저전력 등의 장점으로 인해 시장에서 널리 사용되고 있다. 그러나 이러한 SSD의 장점들에도 불구하고 읽기, 쓰기, 삭제 연산 수행 시간의 비대칭성과 불균등한 기본단위, 덮어쓰기 연산의 불가, 한정된 블록 당 삭제횟수 등의 NAND 플래시 메모리의 내재적 단점들이 존재한다. 그 중 NAND 플래시 메모리의 블록 당 한정된 삭제 횟수는 SSD의 수명에 영향을 끼치며 일정한 삭제 횟수를 초과하게 되면 안정성이 크게 떨어지게 되고 더 이상 사용이 불가능하게 된다. 따라서 본 논문에서는 클라우드 환경에서의 SSD에서 NAND 플래시 블록의 한정된 삭제 횟수에 따른 성능의 효율성을 향상시키기 위하여 중복 제거 기법을 적용한 SSD기반의 회복 효율성 최적화 시스템을 설계하였다.

  • PDF

Performance Analysis and Improvement of WANProxy (WANProxy의 성능 분석 및 개선)

  • Kim, Haneul;Ji, Seungkyu;Chung, Kyusik
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.9 no.3
    • /
    • pp.45-58
    • /
    • 2020
  • In the current trend of increasing network traffic due to the popularization of cloud service and mobile devices, WAN bandwidth is very low compared to LAN bandwidth. In a WAN environment, a WAN optimizer is needed to overcome performance problems caused by transmission protocol, packet loss, and network bandwidth limitations. In this paper, we analyze the data deduplication algorithm of WANProxy, an open source WAN optimizer, and evaluate its performance in terms of network latency and WAN bandwidth. Also, we evaluate the performance of the two-stage compression method of WANProxy and Zstandard. We propose a new method to improve the performance of WANProxy by revising its data deduplication algorithm and evaluate its performance improvement. We perform experiments using 12 data files of Silesia with a data segment size of 2048 bytes. Experimental results show that the average compression rate by WANProxy is 150.6, and the average network latency reduction rates by WANProxy are 95.2% for a 10 Mbps WAN environment and 60.7% for a 100 Mbps WAN environment, respectively. Compared with WANProxy, the two-stage compression of WANProxy and Zstandard increases the average compression rate by 33%. However, it increases the average network latency by 2.1% for a 10 Mbps WAN environment and 5.27% for a 100 Mbps WAN environment, respectively. Compared with WANProxy, our proposed method increases the average compression rate by 34.8% and reduces the average network latency by 13.8% for a 10 Mbps WAN and 12.9% for a 100 Mbps WAN, respectively. Performance analysis results of WANProxy show that its performance improvement in terms of network latency and WAN bandwidth is excellent in a 10Mbps or less WAN environment while superior in a 100 Mbps WAN environment.