DOI QR코드

DOI QR Code

Reliability Assessment of Low-Power Processor Packages for Supercomputers

슈퍼컴퓨터에 사용되는 저전력 프로세서 패키지의 신뢰성 평가

  • Park, Ju-Young (Dept. of System Design and Control Engineering, Ulsan National Institute of Science and Technology) ;
  • Kwon, Daeil (Dept. of System Design and Control Engineering, Ulsan National Institute of Science and Technology) ;
  • Nam, Dukyun (Korea Institute of Science and Technology Information)
  • 박주영 (울산과학기술원 제어설계공학과) ;
  • 권대일 (울산과학기술원 제어설계공학과) ;
  • 남덕윤 (한국과학기술정보연구원 슈퍼컴퓨터개발센터 슈퍼컴퓨터SW연구실)
  • Received : 2016.03.31
  • Accepted : 2016.04.26
  • Published : 2016.06.30

Abstract

While datacenter operation cost increases with electricity price rise, many researchers study low-power processor based supercomputers to reduce power consumption of datacenters. Reliability of low-power processors for supercomputers can be of concern since the reliability of many low-power processors are assessed based on mobile use conditions. This paper assessed the reliability of low-power processor packages based on supercomputer use conditions. Temperature cycling was determined as a critical failure cause of low-power processor packages through literature surveys and failure mode, effect and criticality analysis. The package temperature was measured at multiple processor load conditions to examine the relationship between processor load and package temperature. A physics-of-failure reliability model associated with temperature cycling predicted the expected lifetime of low-power processors to be less than 3 years. Recommendations to improve the lifetime of low-power processors were presented based on the experimental results.

전력가격의 상승으로 데이터센터의 운영비 부담이 늘어나는 가운데, 슈퍼컴퓨터에 저전력 프로세서를 사용하여 데이터센터의 전력소모를 감소시키는 연구가 활발하다. 일반적으로 모바일 기기 등의 운용환경을 기준으로 신뢰성 평가가 이루어지는 저전력 프로세서를 슈퍼컴퓨터에 사용하는 경우 상대적으로 가혹한 운용환경으로 인해 물리적, 기계적 신뢰성 문제가 발생할 수 있다. 이 논문은 슈퍼컴퓨터 운용 환경을 바탕으로 저전력 프로세서 패키지의 수명을 평가하였다. 먼저 문헌조사, 고장모드 및 치명도 분석을 통해 저전력 프로세서 패키지의 주요 고장원인으로 온도 사이클을 선정하였다. 부하-온도 관계를 확인하기 위해 단계적인 부하를 가하며 프로세서의 온도를 측정하였다. 가장 보수적인 운용조건을 가정하고 온도 사이클에 관련된 고장물리 모델을 이용한 결과 저전력 프로세서 패키지의 기대수명은 약 3년 이하로 예측되었다. 실험 결과를 바탕으로 저전력 프로세서 패키지의 기대수명을 향상하는 방법을 제시하였다.

Keywords

References

  1. J. lee, "Is Your Internet Clean? (in Kor.)", Greenpeace Korea, (June 3, 2015) from http://www.greenpeace.org/korea/multimedia/publications/2015/climate-energy/change-it-report/
  2. M. K. Patterson and D. Fenwick, "The State of Datacenter Cooling: A review of current air and liquid cooling solutions", Intel Corporation White Paper, (March, 2008) from http://www.ceclimited.com/sites/all/themes/creative/state-of-datecenter-cooling.pdf
  3. K. J. Cho, S. H. Shin and J. Y. Lee, "Case Study and Energy Impact Analysis of Cooling Technologies as Applied to Green Data Centers (in Kor.)", Journal of the Architectural Institute of Korea Planning & Design, 29(3), 327 (2013). https://doi.org/10.5659/JAIK_PD.2013.29.3.327
  4. H. Zhang, S. Shao, H. Xu, H. Zou and C. Tian, "Free Cooling of Data Centers: A Review", Renewable and Sustainable Energy Reviews, 35(1), 171 (2014). https://doi.org/10.1016/j.rser.2014.04.017
  5. Y. H. Ryu, J. W. Sung, D. S. Kim and S. H. Kil, "A Study on the Infra-Capacity Analysis for Optimal Operating Environments of Supercomputer Center (in Kor.)", Journal of the Korea Institute of Ecological Architecture and Environment, 10(2), 19 (2010).
  6. V. Mehta, "Exploiting CUDA Dynamic Parallelism for low power ARM based prototypes", Nvidia GPU Technology conference On-demand, (April, 2015) from http://on-demand.gputechconf.com/gtc/2015/presentation/S5384-Vishal-Mehta.pdf
  7. Hewlett-Packard Development Company, "HP Project Moonshot Changing the game with extreme low-energy computing" HP business white paper, (May, 2012) from http://h20195.www2.hp.com/V2/GetPDF.aspx/4AA3-9839ENW.pdf
  8. M. Igarashi, T. Uemura, R. Mori, H. Kishibe, M. Nagayama, M. Taniguchi, K. Wakahara, T. Saito, M. Fujigaya and K. Fukuoka, "A 28 nm High-k/MG Heterogeneous Multi-Core Mobile Application Processor With 2 GHz Cores and Low-Power 1 GHz Cores", Journal of Solid-State Circuits, 50(1), 92 (2015). https://doi.org/10.1109/JSSC.2014.2347353
  9. N. Rajovic, P. M. Carpenter, I. Gelado, N. Puzovic, A. Ramirez and M. Valero, "Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?", Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC'13), Denver, CO, 1-12, Association for Computing Machinery (ACM) (2013).
  10. T. Shimoto, K. Kikuchi, K. Baba, K. Matsui, H. Honda and K. Kata, "High-performance FCBGA based on multi-layer thin-substrate packaging technology", Microelectronics Reliability, 44(3), 515 (2004). https://doi.org/10.1016/S0026-2714(03)00164-1
  11. K. H. Kim, H. Lee, J. W. Jeong, J. H. Kim and S. H. Choa, "Numerical Analysis of Warpage and Stress for 4-layer Stacked FBGA Package", J. Microelectron. Package. Soc., 19(2), 7 (2012).
  12. A. Syed, J. Scanlan, S. W. Cha, W. J. Kang, E. S. Sohn, T. S. Kim and C. G. Ryu, "Impact of Package Design and Materials on Reliability for Temperature Cycling, Bend, and Drop Loading Conditions," Proc. 58th Electronic Components and Technology Conference (ECTC), Orlando, 1453, IEEE Components, Packaging and Manufacturing Technology Society (CPMT) (2008).
  13. D. Barbini and M. Meilunas, "Reliability of Lead-free LGAs and BGAs: Effects of Solder Joint Size, Cyclic Strain and Microstructure", SMTA International Proceedings, Fort Worth, Texas, 292, Surface Mount Technology Association (SMTA) (2011).
  14. T. E. Wong, B. A. Reed, H. M. Cohen and D. W. Chu, "Development of BGA solder joint vibration fatigue life prediction model", Proc. 49th Electronic Components and Technology Conference (ECTC), San Diego, CA, 149, IEEE Components, Packaging and Manufacturing Technology Society (CPMT) (1999).
  15. J. E. Luan, T. Y. Tee, Y. G. Kim, H. S. Ng, X. Baraton, R. Bronner and M. Sorrieul, "Drop impact life prediction model for lead-free BGA packages and modules", Proc. 6th International Conference on Thermal, Mechanical and Multi-Physics Simulation and Experiments in Micro-Electronics and Micro-Systems, (EuroSimE), Berlin, Germany, 559, IEEE Components, Packaging and Manufacturing Technology Society (CPMT) (2005).
  16. Nemeth, P. "Accelerated Life Time Test Methods for New Package Technologies", 24th International Spring Seminar on Electronics Technology: Concurrent Engineering in Electronic Packaging, Calimanesti-Caciulata, Romania, 215, IEEE. (2001).
  17. A. Weissel and F. Bellosa, "Dynamic Thermal Management for Distributed Systems", Proceedings of the First Workshop on Temperature-Aware Computer Systems (TACS'04), Munich, Germany, 3-13 (2004).
  18. R. A. Steinbrecher and R. Schmidt, "Data Center Environments ASHRAE's Evolving Thermal Guidelines", ASHRAE Journal 53(12), J. Scott, Ed., pp.42-49, Atlanta, GA (2011).
  19. K. Ebrahimi, G. F. Jones and A. S. Fleischer, "A review of data center cooling technology, operating conditions and the corresponding low-grade waste heat recovery opportunities", Renewable and Sustainable Energy Reviews, 31(1), 622. (2014). https://doi.org/10.1016/j.rser.2013.12.007
  20. International Standard IEC 60812: Analysis Techniques for System Reliability: Procedure for Failure Mode and Effects Analysis (FMEA), International Electrotechnical Commission, (2006).
  21. H. Chung, M. Kang and H. D. Cho, "Heterogeneous Multi-Processing Solution of Exynos 5 Octa with ARM$^{(R)}$ big. LITTLE$^{TM}$ Technology", Samsung Electronics Co., (2012) from https://www.arm.com/files/pdf/Heterogeneous_Multi_Processing_Solution_of_Exynos_5_Octa_with_ARM_bigLITTLE_Technology.pdf
  22. R. D. Gerke and G. B. Kromann, "Solder Joint Reliability of High I/O Ceramic-Ball-Grid Arrays and Ceramic Quad-Flat-Packs in Computer Environments: the PowerPC 603 TM and PowerPC 604 TM microprocessors", IEEE Transactions on Components and Packaging Technologies, 22(4), 488. (1999). https://doi.org/10.1109/6144.814963
  23. V. Vasudevan and X. Fan, "An Acceleration Model for Lead-Free (SAC) Solder Joint Reliability under Thermal Cycling", Proc. 58th Electronic Components and Technology Conference (ECTC), Orlando, 139, IEEE Components, Packaging and Manufacturing Technology Society (CPMT) (2008).
  24. S. H. Kim, G. T. Park, B. R. Lee, J. M. Kim, S. Yoo and Y. B. Park, "Effects of PCB Surface Finishes on in-situ Intermetallics Growth and Electromigration Characteristics of Sn-3.0Ag-0.5Cu Pb-free Solder Joints", J. Microelectron. Package. Soc., 22(2), 47 (2015).