DOI QR코드

DOI QR Code

Deep reinforcement learning for a multi-objective operation in a nuclear power plant

  • Junyong Bae (Department of Nuclear Engineering, Ulsan National Institute of Science and Technology) ;
  • Jae Min Kim (Department of Nuclear Engineering, Ulsan National Institute of Science and Technology) ;
  • Seung Jun Lee (Department of Nuclear Engineering, Ulsan National Institute of Science and Technology)
  • Received : 2023.02.19
  • Accepted : 2023.06.03
  • Published : 2023.09.25

Abstract

Nuclear power plant (NPP) operations with multiple objectives and devices are still performed manually by operators despite the potential for human error. These operations could be automated to reduce the burden on operators; however, classical approaches may not be suitable for these multi-objective tasks. An alternative approach is deep reinforcement learning (DRL), which has been successful in automating various complex tasks and has been applied in automation of certain operations in NPPs. But despite the recent progress, previous studies using DRL for NPP operations have limitations to handle complex multi-objective operations with multiple devices efficiently. This study proposes a novel DRL-based approach that addresses these limitations by employing a continuous action space and straightforward binary rewards supported by the adoption of a soft actor-critic and hindsight experience replay. The feasibility of the proposed approach was evaluated for controlling the pressure and volume of the reactor coolant while heating the coolant during NPP startup. The results show that the proposed approach can train the agent with a proper strategy for effectively achieving multiple objectives through the control of multiple devices. Moreover, hands-on testing results demonstrate that the trained agent is capable of handling untrained objectives, such as cooldown, with substantial success.

Keywords

Acknowledgement

This work was supported by a National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. RS-2022-00165231).

References

  1. D. Lee, S. Koo, I. Jang, J. Kim, Comparison of deep reinforcement learning and PID controllers for automatic cold shutdown operation, Energies 15 (2022) 2834.
  2. C.K. Lee, D.H. Kim, I.S. Oh, J.B. Han, J.W. Shin, Y.B. Kim, Functional Descriptions for Instrumentation and Control System in Nuclear Power Plants, 2003.
  3. W. Jo, S. Jun Lee, Bayesian belief network-based human reliability analysis methodology for start-up and shutdown operations in nuclear power plants, Ann. Nucl. Energy 179 (2022), 109403.
  4. D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, F. Hui, L. Sifre, G. van den Driessche, T. Graepel, D. Hassabis, Mastering the game of Go without human knowledge, Nature 550 (2017) 354-359. https://doi.org/10.1038/nature24270
  5. O. Vinyals, I. Babuschkin, W.M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D.H. Choi, R. Powell, T. Ewalds, P. Georgiev, J. Oh, D. Horgan, M. Kroiss, I. Danihelka, A. Huang, L. Sifre, T. Cai, J.P. Agapiou, M. Jaderberg, A.S. Vezhnevets, R. Leblond, T. Pohlen, V. Dalibard, D. Budden, Y. Sulsky, J. Molloy, T.L. Paine, C. Gulcehre, Z. Wang, T. Pfaff, Y. Wu, R. Ring, D. Yogatama, D. Wunsch, K. McKinney, O. Smith, T. Schaul, T. Lillicrap, K. Kavukcuoglu, D. Hassabis, C. Apps, D. Silver, Grandmaster level in StarCraft II using multi-agent reinforcement learning, Nature 575 (2019) 350-354. https://doi.org/10.1038/s41586-019-1724-z
  6. A. Fawzi, M. Balog, A. Huang, T. Hubert, B. Romera-Paredes, M. Barekatain, A. Novikov, F.J.R. Ruiz, J. Schrittwieser, G. Swirszcz, D. Silver, D. Hassabis, P. Kohli, Discovering faster matrix multiplication algorithms with reinforcement learning, Nature 610 (2022) 47e53.
  7. E. Prianto, M. Kim, J.-H. Park, J.-H. Bae, J.-S. Kim, Path planning for multi-arm manipulators using deep reinforcement learning: soft actor-critic with hindsight experience replay, Sensors 20 (2020) 5911.
  8. M.H. Lee, J. Moon, Deep reinforcement learning-based UAV navigation and control: a soft actor-critic with hindsight experience replay approach, arXiv preprint arXiv:2106.01016 (2021).
  9. Y.H. Chae, S.G. Kim, H. Kim, J.T. Kim, P.H. Seong, A methodology for diagnosing FAC induced pipe thinning using accelerometers and deep learning models, Ann. Nucl. Energy 143 (2020), 107501.
  10. J. Bae, J.W. Park, S.J. Lee, Limit surface/states searching algorithm with a deep neural network and Monte Carlo dropout for nuclear power plant safety assessment, Appl. Soft Comput. 124 (2022), 109007.
  11. S. Ryu, H. Kim, S.G. Kim, K. Jin, J. Cho, J. Park, Probabilistic deep learning model as a tool for supporting the fast simulation of a thermal-hydraulic code, Expert Syst. Appl. (2022) 200.
  12. H. Kim, J. Cho, J. Park, Application of a deep learning technique to the development of a fast accident scenario identifier, IEEE Access 8 (2020) 177363-177373. https://doi.org/10.1109/ACCESS.2020.3026104
  13. J.H. Shin, J.M. Kim, S.J. Lee, Abnormal state diagnosis model tolerant to noise in plant data, Nucl. Eng. Technol. 53 (2021) 1181-1188. https://doi.org/10.1016/j.net.2020.09.025
  14. G. Lee, S.J. Lee, C. Lee, A convolutional neural network model for abnormality diagnosis in a nuclear power plant, Appl. Soft Comput. 99 (2021).
  15. J. Choi, S.J. Lee, A sensor fault-tolerant accident diagnosis system, Sensors 20 (2020) 1-17. https://doi.org/10.1109/JSEN.2019.2959158
  16. J.M. Kim, G. Lee, C. Lee, S.J. Lee, Abnormality diagnosis model for nuclear power plants using two-stage gated recurrent units, Nucl. Eng. Technol. 52 (2020) 2009-2016. https://doi.org/10.1016/j.net.2020.02.002
  17. Y. Chae, C. Lee, S. Han, P. Seong, Graph neural network based multiple accident diagnosis in nuclear power plants: data optimization to represent the system configuration, Nucl. Eng. Technol. (2022) 54.
  18. S. Ryu, B. Jeon, H. Seo, M. Lee, J.-W. Shin, Y. Yu, Development of Deep Autoencoder-Based Anomaly Detection System for HANARO, Nuclear Engineering and Technology, 2022.
  19. J. Ahn, S.J. Lee, Deep learning-based procedure compliance check system for nuclear power plant emergency operation, Nucl. Eng. Des. (2020) 370.
  20. J. Bae, G. Kim, S.J. Lee, Real-time prediction of nuclear power plant parameter trends following operator actions, Expert Syst. Appl. (2021) 186.
  21. J.S. Kang, S.J. Lee, Concept of an intelligent operator support system for initial emergency responses in nuclear power plants, Nucl. Eng. Technol. 54 (2022) 2453-2466. https://doi.org/10.1016/j.net.2022.02.010
  22. M.I. Radaideh, I. Wolverton, J. Joseph, J.J. Tusar, U. Otgonbaatar, N. Roy, B. Forget, K. Shirvan, Physics-informed reinforcement learning optimization of nuclear assembly design, Nucl. Eng. Des. 372 (2021), 110966.
  23. J. Park, T. Kim, S. Seong, S. Koo, Control automation in the heat-up mode of a nuclear power plant using reinforcement learning, Prog. Nucl. Energy 145 (2022), 104107.
  24. D. Lee, A.M. Arigi, J. Kim, Algorithm for autonomous power-increase operation using deep reinforcement learning and a rule-based system, IEEE Access 8 (2020) 196727-196746. https://doi.org/10.1109/ACCESS.2020.3034218
  25. J.M. Kim, J. Bae, S.J. Lee, Strategy to Coordinate Actions through a Plant Parameter Prediction Model during Startup Operation of a Nuclear Power Plant, Nuclear Engineering and Technology, 2022.
  26. T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V. Kumar, H. Zhu, A. Gupta, P. Abbeel, Soft actor-critic algorithms and applications, arXiv preprint arXiv:1812.05905 (2018).
  27. M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, O. Pieter Abbeel, W. Zaremba, Hindsight experience replay, Adv. Neural Inf. Process. Syst. (2017) 30.
  28. R.S. Sutton, A.G. Barto, Reinforcement Learning: an Introduction, MIT press, 2018.
  29. V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness, M.G. Bellemare, A. Graves, M. Riedmiller, A.K. Fidjeland, G. Ostrovski, Human-level control through deep reinforcement learning, nature 518 (2015) 529-533. https://doi.org/10.1038/nature14236
  30. T. Haarnoja, A. Zhou, P. Abbeel, S. Levine, Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor, in: International Conference on Machine Learning, PMLR, 2018, pp. 1861-1870.
  31. J. Bae, J. Kim, S.J. Lee, An Autonomous Pressure Controller Based on Approximation of Action Value Function, Transactions of the Korean Nuclear Society, 2020.
  32. W. Zhang, S.X. Yang, L. Yu, Soft ACTOR-CRITIC reinforcement learning for robotic manipulator with hindsight experience replay Tao yan, Int. J. Robot Autom. 34 (2019).
  33. K.-C. Kwon, J.-C. Park, C.-H. Jung, J.-S. Lee, J.-Y. Kim, Compact Nuclear Simulator and its Upgrade Plan, 1997.
  34. J. Miettinen, Development and assessment of the SBLOCA code SMABRE, in: Proceedings of the CSNI Specialists' Meeting on Small Break LOCA Analyses in LWRs, 1985, pp. 23-27. Pisa, Italy.
  35. C.R. Harris, K.J. Millman, S.J. Van Der Walt, R. Gommers, P. Virtanen, D. Cournapeau, E. Wieser, J. Taylor, S. Berg, N.J. Smith, Array programming with NumPy, Nature 585 (2020) 357-362. https://doi.org/10.1038/s41586-020-2649-2
  36. M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G.S. Corrado, A. Davis, J. Dean, M. Devin, Tensorflow: large-scale machine learning on heterogeneous distributed systems, arXiv preprint arXiv:1603.04467 (2016).
  37. J.H. Shin, J. Bae, J.M. Kim, S.J. Lee, An interpretable convolutional neural network for nuclear power plant abnormal events, Appl. Soft Comput. 132 (2023), 109792.
  38. J. Garcia, F. Fern andez, A comprehensive survey on safe reinforcement learning, J. Mach. Learn. Res. 16 (2015) 1437-1480.