DOI QR코드

DOI QR Code

An Efficient Record-Replay Mechanism using Hardware Performance Counters and Debugging Facilities

하드웨어 성능 카운터와 디버깅 기능을 이용한 리코드-리플레이 방법

  • 맹지찬 (한양대학교 전자컴퓨터통신학과) ;
  • 유민수 (한양대학교 컴퓨터공학부)
  • Received : 2011.03.24
  • Accepted : 2011.05.25
  • Published : 2011.10.31

Abstract

In this paper, we present a record-replay technique based on interrupt logging and reproduction. Race conditions have been considered as the main source of nondeterminism in conventional record-replay approaches. However, interrupts are another source of nondeterministic computer system behavior, which must be reproduced at accurate time points, let alone the order of interrupt occurrence. We show that an interrupt-based replayer can be efficiently and effectively implemented by using hardware performance counters and debugging functionality. Experiments also show that the runtime overhead of the interrupt-based replayer is sufficiently low.

본 논문에서는 인터럽트의 기록과 재현을 통해 소프트웨어의 실행을 동일하게 재현하는 리코드-리플레이(record-replay) 기법을 제안한다. 전통적인 리코드-리플레이 방법에서는 경합(race) 현상을 대표적인 비결정적 요인으로 간주하여 임계영역으로의 진입/진출, 공유 메모리 접근, 메시지 교환 등을 기록하고 동일한 순서(order)로 재현하는 방법을 다루어 왔다. 하지만, 인터럽트 역시 프로그램의 실행에 영향을 끼칠 수 있는 중요한 비결정적 요인이며, 게다가 인터럽트의 경우 발생 순서는 물론 정확한 발생 시점을 재현하는 것이 필요하다. 이에 본 논문에서는 프로세서 하드웨어가 제공하는 성능 카운터와 디버깅 기능을 이용하여 인터럽트의 발생 시점을 정확하게 기록하고 재현하는 방법을 제안한다.

Keywords

References

  1. D. Wittie, "Debugging distributed c programs by real time reply," In Proceedings of the ACM SIGPLAN and SIGOPS Workshop on Parallel and Distributed Debugging, pages 57- 67, 1988. https://doi.org/10.1145/68210.69221
  2. T. J. LeBlanc and J. M. Mellor-Crummey, "Debugging parallel programs with instant replay," IEEE Transaction on Computers, 36(4):471-482, 1987. https://doi.org/10.1109/TC.1987.1676929
  3. Michiel Ronsse and Koen De Bosschere, "RecPlay: a fully integrated practical record/replay system," ACM Transactions on Computer Systems, 17(2): 133-152, 1999. https://doi.org/10.1145/312203.312214
  4. J. Choi and H. Srinivasan, "Deterministic replay of Java multithreaded applications," In Proceedings of the SIGMETRICS Symposium on Parallel and Distributed Tools, pages 48-59, 1998. https://doi.org/10.1145/281035.281041
  5. J. H. Slye and E. Elnozahy. Support for software interrupts in log-based rollback-recovery. IEEE Transactions on Computers, 47(10):1113-1123, 1998. https://doi.org/10.1109/12.729794
  6. Dmitrijs Zaparanuks, Milan Jovic, and Matthias Hauswirth, "Accuracy of Performance Counter Measurements," In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2009. https://doi.org/10.1109/ISPASS.2009.4919635
  7. S. Narayanasamy, G. Pokam, and B. Calder, "BugNet: Continuously recording program execution for deterministic replay debugging," In Proceedings of the 32nd International Symposium on Computer Architecture, 2005. https://doi.org/10.1109/ISCA.2005.16