DOI QR코드

DOI QR Code

A Distributed Cache Management Scheme for Efficient Accesses of Small Files in HDFS

HDFS에서 소형 파일의 효율적인 접근을 위한 분산 캐시 관리 기법

  • 오현교 (충북대학교 정보통신공학부) ;
  • 김기연 (충북대학교 정보통신공학부) ;
  • 황재민 (충북대학교 정보통신공학부) ;
  • 박준호 (국방과학연구소 제1기술연구본부) ;
  • 임종태 (충북대학교 정보통신공학부) ;
  • 복경수 (충북대학교 정보통신공학부) ;
  • 유재수 (충북대학교 정보통신공학부)
  • Received : 2014.09.15
  • Accepted : 2014.10.14
  • Published : 2014.11.28

Abstract

In this paper, we propose the distributed cache management scheme to efficiently access small files in Hadoop Distributed File Systems(HDFS). The proposed scheme can reduce the number of metadata managed by a name node since many small files are merged and stored in a chunk. It is also possible to reduce the file access costs, by keeping the information of requested files using the client cache and data node caches. The client cache keeps small files that a user requests and metadata. Each data node cache keeps the small files that are frequently requested by users. It is shown through performance evaluation that the proposed scheme significantly reduces the processing time over the existing scheme.

Keywords

Hadoop Distributed File System;Small File;Distributed Cache;Cache Metadata

Acknowledgement

Supported by : 한국연구재단

References

  1. J. Dittrich and J. Quiane-Ruiz, "Efficient BigData Processing in Hadoop MapReduce," Proc. of VLDB Endowment, Vol.5, No.12, pp.2014-2015, 2012. https://doi.org/10.14778/2367502.2367562
  2. J. Cohen, J. Dolan, M. Dunlap, J. Hellerstein, and C. Welton, "MAD Skills: New Analysis Practices for Big Data," Proc. of VLDB Endowment, Vol.2, No.2, pp.1481-1492, 2009. https://doi.org/10.14778/1687553.1687576
  3. http://hadoop.apache.org
  4. K. Schvachko, H. Kuang, S. Radia, and R. Chansler, "The Hadoop Distributed File System," Proc of IEEE Symposium on Mass Storage Systems and Technologies, pp.1-10, 2010.
  5. http://hadoop.apache.org/core/docs/current/hdfs_design.html
  6. J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing on Large Cluster," Communications of the ACM, Vol.51, No.1, pp.107-113, 2008.
  7. 류은경, 손인국, 박준호, 복경수, 유재수, "비-전용 분산 컴퓨팅 환경에서 맵-리듀스 처리 성능 최적화를 위한 효율적인 데이터 재배치 알고리즘", 한국콘텐츠학회논문지, 제13권, 제9호, pp.20-27, 2013 https://doi.org/10.5392/JKCA.2013.13.09.020
  8. 손인국, 류은경, 박준호, 복경수, 유재수, "맵-리듀스의 처리 속도 향상을 위한 데이터 접근 패턴에 따른 핫-데이터 복제 기법", 한국콘텐츠학회논문지, 제13권, 제11호, pp.21-27, 2013 https://doi.org/10.5392/JKCA.2013.13.11.021
  9. http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
  10. B. Dong, J. Qiu, O. Zheng, X. Zhong, J. Li, and Y. Li, "A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop:a Case Study by Power Point Files," Proc. of IEEE International Conference on Services Computing, pp.65-72, 2010.
  11. D. Chandrasekar, R. Dakshinamurthy, P. G. Sechakumar, and B. Prabavathy, "A Novel Indexing Scheme for Efficient Handling of Small Files in Hadoop Distributed File System," Proc. of International Conference on Computer Communication and Informatics, pp.1-8, 2013.
  12. J. Zhang, G. Wu, X. Hu, and X. Wu, "A Distributed Cache for Hadoop Distributed File System in Real-time Cloud Services," Proc. of International Conference on Grid Computing, pp.12-21, 2012.