DOI QR코드

DOI QR Code

Performance Optimization Considering I/O Data Coherency in Stream Processing

Stream Processing에서 I/O데이터 일관성을 고려한 성능 최적화

  • Received : 2015.09.04
  • Accepted : 2016.07.28
  • Published : 2016.08.25

Abstract

Performance optimization of applications with massive stream data processing has been performed by considering I/O data coherency problem where a memory is shared between processors and hardware accelerators. A formula for performance analyses is derived based on profiling results of system-level simulations. Our experimental results show that overall performance was improved by 1.40 times on average for various image sizes. Also, further optimization has been performed based on the parameters appeared in the derived formula. The final performance gain was 3.88 times comparing to the original design and we can find that the performance of the design with cacheable shared memory is not always.

본 논문은 대량의 stream data를 처리하는 어플리케이션에서 하드웨어 가속기들이 접근하는 메모리가 non-cacheable에서 cacheable으로 변경됨에 따라 발생할 수 있는 데이터 일관성 문제를 고려하여 시스템 최적화를 진행하였다. 이를 위해 상위 수준 시뮬레이션을 통한 프로파일링 결과를 토대로 분석식을 만들어 활용하였다. 실험한 결과 여러 이미지 크기에서 메모리가 cacheable로 변경됨에 따라 평균 1.40배의 성능 향상을 보였다. 분석식의 주요 파라미터 최적화를 통해 최종적으로 3.88배의 성능 이득이 발생했으며, 항상 메모리가 cacheable인 경우의 성능이 항상 우월한 것은 아님을 확인할 수 있었다.

Keywords

References

  1. D.Kudithipudi, S.Petko, E.B.John, "Caches for Multimedia Workloads:Power and Energy Tradeoffs", IEEE Transaction, vol.10, pp. 1013-1021, 2008.
  2. Zheng Fang, C.Venkatramani, R.Wagle, K.Schwan, "Cache Topology Aware Mapping of Stream Processing Applications onto CMPs", In ICDCS, pp. 52-61, 2013.
  3. A.Dash, Petrov,P., "Energy-Efficient Cache Coherence for Embedded Multi-Processor Systems through Application-Driven Snoop Filtering", In Proc. of 9th EUROMICRO Conference on DSD, pp. 79-82, 2006.
  4. D.Chaiken, C.Fields, K.Kurihara, A.Agrawl, "Directory-Based Cache Coherence in Large-scale Multiprocessors", IEEE Computer, pp. 49-58, June 1990.
  5. J.Archibald, Jean-Loup Bear, "Cache Coherence Protocols: Evaluation Using a Multiprocessor Simulation Model", ACM TOCS, vol. 4, pp. 273-298, Nov. 1986 https://doi.org/10.1145/6513.6514
  6. H.Cheong, A.V.Veidenbaum, "A Version Control Approach to Cache Coherence", In Proc. of 3rd Intl. conference on supercomputing, pp. 322-330, 1989.
  7. Thomas B.Berg, "Maintaining I/O Data Coherence in Embedded Multicore Systems", IEEE Micro, pp. 10-19, May, 2009.
  8. Dan Tang, Yungang Bao, Weiwu Hu, Mingyu Chen, "DMA Cache: Using On-Chip Storage to Architecturally Separate I/O Data from CPU data for Improving I/O Performance", In Proc. of 16th Intl. Symposium on HPCA, pp. 1-12, Jan. 2010.
  9. R.Huggahalli, R.Iyer, S.Tetrick, "Direct Cache Access for High Bandwidth Network I/O", In Proc. of 32nd Intl. Symposium on Computer Architecture, pp. 50-59, 2005.
  10. ARM, ARMv7-R Architecture Reference Manual
  11. Zucker,R.N, Beat,Jean-Loup, "Software versus hardware coherence: performance versus cost", In Proc. of Intl. Conference, Jan. 1994.
  12. ARM, Cortex-R4 and Cortex-R4F Technical Reference Manual
  13. Carbon: http://www.carbondesignsystems.com
  14. Ashby,T.J., Diaz.P., Cintra,M., "Software-Based Cache Coherence with Hardware-Assisted Selective Self-Invalidations Using Bloom Filters", IEEE Trans. Computers, pp. 472-483, 2011.
  15. A.Sloss,D.Symes,C.Wright, "ARM System Developer's Guide", Morgan Kaufmann, 2004.
  16. Hana Na, Changwon Choi, Joonwhan Yi, "Mass Data Transfer Using DMAC along with Cache Flush", IEIE, pp. 71-74, June 2014
  17. ARM, AMBA AXI Protocol