DOI QR코드

DOI QR Code

Data Deduplication Method using Locality-based Chunking policy for SSD-based Server Storages

SSD 기반 서버급 스토리지를 위한 지역성 기반 청킹 정책을 이용한 데이터 중복 제거 기법

  • Received : 2012.10.19
  • Published : 2013.02.25

Abstract

NAND flash-based SSDs (Solid State Drive) have advantages of fast input/output performance and low power consumption so that they could be widely used as storages on tablet, desktop PC, smart-phone, and server. But, SSD has the disadvantage of wear-leveling due to increase of the number of writes. In order to improve the lifespan of the SSD, a variety of data deduplication techniques have been introduced. General fixed-size splitting method allocates fixed size of chunk without considering locality of data so that it may execute unnecessary chunking and hash key generation, and variable-size splitting method occurs excessive operation since it compares data byte-by-byte for deduplication. This paper proposes adaptive chunking method based on application locality and file name locality of written data in SSD-based server storage. The proposed method split data into 4KB or 64KB chunks adaptively according to application locality and file name locality of duplicated data so that it can reduce the overhead of chunking and hash key generation and prevent duplicated data writing. The experimental results show that the proposed method can enhance write performance, reduce power consumption and operation time compared to existing variable-size splitting method and fixed size splitting method using 4KB.

낸드 플래시 기반 SSD(Solid State Drive)는 빠른 입출력 성능, 저전력 등의 장점을 가지고 있어, 타블릿, 데스크탑 PC, 스마트폰, 서버 등의 저장장치로 널리 사용되고 있다. 하지만 SSD는 쓰기 횟수에 따라서 마모도가 증가하는 단점이 있다. SSD의 수명을 향상시키기 위해 다양한 데이터 중복제거 기법이 도입되었으나, 일반적인 고정 크기 분할방식은 데이터의 지역성을 고려하지 않고 청크크기를 할당함으로써, 불필요한 청킹 및 해시값 생성을 수행하는 문제점이 있으며, 가변 크기 분할방식은 중복제거를 위해 바이트 단위로 비교하여 과도한 연산량을 유발한다. 본 논문에서는 SSD 기반 서버급 스토리지에서 쓰기 요청된 데이터의 지역성에 기반한 적응형 청킹 정책을 제안한다. 제안한 방법은 중복데이터가 가지는 응용프로그램 및 파일 이름 기반 지역성에 따라 청크 크기를 4KB 또는 64KB로 적응적으로 분할하여, 청킹 및 해시값 생성에 따른 오버헤드를 감소시키고, 중복 쓰기를 방지한다. 실험결과, 제안하는 기법이 기존의 가변 크기 분할 및 4KB의 고정 크기 분할을 이용한 중복제거기법보다 SSD의 쓰기 성능이 향상되고 전력 소모 및 연산시간을 감소시킬 수 있음을 보여준다.

Keywords

References

  1. J. F. Gantz, C. Chute, A. Manfrediz, S. Minton, D. Reinsel, W. Schlichting, and A. Toncheva, "The diverse and exploding digital universe: An updated forecast of worldwide information growth through 2011," IDC, An IDC White Paper- sponsored by EMC, March 2008.
  2. D.G. Andersen and S.Swanson, "Rethinking flash in the data center", IEEE Micro, vol. 30, no. 4, pp.52-54, Jul. 2010. https://doi.org/10.1109/MM.2010.71
  3. J. Min et al, "Efficient Deduplication Techiques for Modern Backup Operation," IEEE TRANSACTIONS ON COMPUTERS, VOL. 60, NO. 6, June, 2011.
  4. Chin-Hsien Wu, Hau-Shan Wu, "A data de-duplication access framework for solid state drives", SAC'11, Proceedings of the 2011 ACM Symposium on Applied Computing, pp.600-604, Mar, 2011.
  5. Seung-Kyu Lee, Yu-Seok Yang, Deok-Hwan Kim, "Hybrid Data Deduplication Method for Reducing Wear-Level of SSD-based Server Storage", Journal of KIISE : Computer Systems and Theory, Vol 38, No 6, pp.292-297, Dec, 2011.
  6. Lawrence You and Christos Karamanolis, "Evaluation of efficient archival storage techniques", Proceedings of the 21st IEEE / 12th NASA Goddard Conference on Mass Storage Systems and Technologies, pp.1-6, Apr, 2004.
  7. Ahmed El-Shimi, Ran Kalach, Ankit Kumar, Adi Oltean, Jin Li, and Sudipta Sengupta, "Primary Data Deduplication-Large Scale Study and System Design", Usenix ATC'12, June, 2012.
  8. S. Quinlan and S. Dorward, "Venti: a new approach to archival storage," in Proceedings of the 1st USENIX conference on File and storage technologies, pp.89-101, 2002.
  9. Athicha Muthitacharoen, Benjie Chen, David Maz Ieres "A low-bandwidth network file system" , in proceeding SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles. pp.174-187, 2001.
  10. M. O. Rabin, "Fingerprinting by random polynomials", Center for Research in Computing Technology, Tech. Rep.TR-15-81, 1981.
  11. Yu-Seok Yang, Seung-Kyu Lee, Deok-Hwan Kim, "De-duplication of Parity Disk in SSD-Based RAID System", Journal of IEEK : CI, acceptance publication, Dec, 2012.
  12. Laura DuBois, Robert Amatruda, "Using Deduplication efficiency & IT cost reduction" IDC analyze the Future. September 2010.
  13. B. Debnath, S. Sengupta, J. Li, "ChunkStash:S peeding up Inline Storage Deduplication using Flash Memory", USENIX ATC'10, 2010.
  14. A. Gupta, R. Pisolka, B. Urgaonkar, and ASivasubramaniam, "Leveraging value locality in optimizing nand flash-based ssds", in Proceedings of the 9th USENIX conference on File and storage technologies, 2011.
  15. F. Chen, T. Luo, and X. Zhang, "Caftl: a cont ent-aware flash translation layer enhancing the lifespan of flash memory based solid state drives" in Proceedings of the 9th USENIX conference on File and stroage technologies, 2011.