DOI QR코드

DOI QR Code

그리드 환경의 적응형 오류 극복 관리 시스템 설계 및 구현

Design and Implementation of Adaptive Fault-Tolerant Management System over Grid

  • 김은경 (SK 커뮤니케이션즈) ;
  • 김지영 (숙명여자대학교 컴퓨터과학) ;
  • 김윤희 (숙명여자대학교 컴퓨터과학과)
  • 발행 : 2008.06.30

초록

서비스 이동과 자원 상태 변화 등 실행 환경 변화가 빈번히 발생하는 그리드 컴퓨팅 환경은 다양한 응용 프로그램 작업 환경을 지원하고 사용자에게 끊임없는 작업 환경을 보장하기 위하여 고가용성을 지원하는 미들웨어가 필수적으로 필요하다. 기존의 분산 환경 미들웨어 역시 고가용성 지원 서비스가 일부 연구자에 의해 진행되고 있으나 공개표준은 아니며 다양한 그리드 서비스에 대한 고려가 없다. 본 논문에서는 환경에 따라 적응하는 서비스 미들웨어 런타임 서비스 관리 시스템을 통해 자율적으로 작업 환경을 재구성하도록 하여 미들웨어의 가용성을 증대시키고 안정적으로 서비스의 계속성과 데이터 및 자료의 일관성을 보장하는 방법을 제시하고 프로토타입 Wapee(Web-Service based Application Execution Environment)를 통해 실제 환경에서 적용 가능성을 확인한다.

A middleware in grid computing environment is required to support seamless on-demand services over diverse resource situations in order to meet various user requirements [1]. Since grid computing applications need situation-aware middleware services in this environment. In this paper, we propose a semantic middleware architecture to support dynamic software component reconfiguration based fault and service ontology to provide fault-tolerance in a grid computing environment. Our middleware includes autonomic management to detect faults, analyze causes of them, and plan semantically meaningful strategies to recover from the failure using pre-defined fault and service ontology trees. We implemented a referenced prototype, Web-service based Application Execution Environment(Wapee), as a proof-of-concept, and showed the efficiency in runtime recovery.

키워드

참고문헌

  1. M. Weiser, “The computer for the 21st Century,” Scientific American, vol.265, no.3, pp.94-104, September, 1991
  2. Satish Tadepalli, Calvin Ribbens, Srinid Varadarahan, “GEMS: A Job Management System for Fault Tolerant Grid Computing”, High Peformance Computing Symposium, 2004
  3. Jang-uk In, Paul Avery, Richard Cavanaugh, Laukik Chitnis, Mandar Kulkarni, Sanjay Ranka, “SPHINX: A Fault-Tolerant System for Scheduling in Dynamic Grid Environments”, 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05), p.12b, 2005
  4. Condor/DAGMan http://www.cs.wisc.edu/condor/dagman
  5. Zbigniew Kalbarczyk, Ravishankar K Iyer, Long Wang, “Application Fault Tolerance with Armor Middleware”, Internet Computing, Vol.9, No.2, pp.28-37, March/April, 2005 https://doi.org/10.1109/MIC.2005.31
  6. P. Narasimhan, C. F. Reverte, S. Ratanotayanon and G. S. Hartman, “Middleware for Embedded Adaptive Dependability”, IEEE Workshop on Large Scale Real-Time and Embedded Systems, Austin, TX, December, 2002
  7. Eduardo Ostertag, James Hendler, Ruben Prieto Diaz, Christine Braun., “Computing similarity in a reuse library system: an AI-based approach”, ACM Trans. Softw. Eng. Methodol., vol.1, no.3, pp.205-228, 1992 https://doi.org/10.1145/131736.131739
  8. R. Prieto-Diaz, P. Freeman, “Classifying Software for Reuse”, IEEE Software, 4(1), pp.6-16, 1987 https://doi.org/10.1109/MS.1987.229789
  9. Hwayoun Lee, Ho-Jin Choi, In-Young Ko., “A Semantically-Based Software Component Selection Mechanism for Intelligent Service Robots”, Proceedings of 4th Mexican International Conference on Artificial Intelligence (MICAI2005), Monterrey, Mexico, November, 2005
  10. Yoonhee Kim, Eun-kyung Kim, Beom-Jun Jeon, In-Young Ko, and Sung-Yong Park, “Wapee: A Fault-Tolerant Semantic Middleware in Ubiquitous Computing Environments”, EUC Workshops, IFIP International Federation for Information Processing, LNCS 4097, Seoul, August, 2006