• Title/Summary/Keyword: Policy Gradient

Search Result 71, Processing Time 0.025 seconds

Autonomous control of bicycle using Deep Deterministic Policy Gradient Algorithm (Deep Deterministic Policy Gradient 알고리즘을 응용한 자전거의 자율 주행 제어)

  • Choi, Seung Yoon;Le, Pham Tuyen;Chung, Tae Choong
    • Convergence Security Journal
    • /
    • v.18 no.3
    • /
    • pp.3-9
    • /
    • 2018
  • The Deep Deterministic Policy Gradient (DDPG) algorithm is an algorithm that learns by using artificial neural network s and reinforcement learning. Among the studies related to reinforcement learning, which has been recently studied, the D DPG algorithm has an advantage of preventing the cases where the wrong actions are accumulated and affecting the learn ing because it is learned by the off-policy. In this study, we experimented to control the bicycle autonomously by applyin g the DDPG algorithm. Simulation was carried out by setting various environments and it was shown that the method us ed in the experiment works stably on the simulation.

  • PDF

A Study of Adaptive QoS Routing scheme using Policy-gradient Reinforcement Learning (정책 기울기 값 강화학습을 이용한 적응적인 QoS 라우팅 기법 연구)

  • Han, Jeong-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.2
    • /
    • pp.93-99
    • /
    • 2011
  • In this paper, we propose a policy-gradient routing scheme under Reinforcement Learning that can be used adaptive QoS routing. A policy-gradient RL routing can provide fast learning of network environments as using optimal policy adapted average estimate rewards gradient values. This technique shows that fast of learning network environments results in high success rate of routing. For prove it, we simulate and compare with three different schemes.

Reinforcement Learning based on Deep Deterministic Policy Gradient for Roll Control of Underwater Vehicle (수중운동체의 롤 제어를 위한 Deep Deterministic Policy Gradient 기반 강화학습)

  • Kim, Su Yong;Hwang, Yeon Geol;Moon, Sung Woong
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.24 no.5
    • /
    • pp.558-568
    • /
    • 2021
  • The existing underwater vehicle controller design is applied by linearizing the nonlinear dynamics model to a specific motion section. Since the linear controller has unstable control performance in a transient state, various studies have been conducted to overcome this problem. Recently, there have been studies to improve the control performance in the transient state by using reinforcement learning. Reinforcement learning can be largely divided into value-based reinforcement learning and policy-based reinforcement learning. In this paper, we propose the roll controller of underwater vehicle based on Deep Deterministic Policy Gradient(DDPG) that learns the control policy and can show stable control performance in various situations and environments. The performance of the proposed DDPG based roll controller was verified through simulation and compared with the existing PID and DQN with Normalized Advantage Functions based roll controllers.

Learning Optimal Trajectory Generation for Low-Cost Redundant Manipulator using Deep Deterministic Policy Gradient(DDPG) (저가 Redundant Manipulator의 최적 경로 생성을 위한 Deep Deterministic Policy Gradient(DDPG) 학습)

  • Lee, Seunghyeon;Jin, Seongho;Hwang, Seonghyeon;Lee, Inho
    • The Journal of Korea Robotics Society
    • /
    • v.17 no.1
    • /
    • pp.58-67
    • /
    • 2022
  • In this paper, we propose an approach resolving inaccuracy of the low-cost redundant manipulator workspace with low encoder and low stiffness. When the manipulators are manufactured with low-cost encoders and low-cost links, the robots can run into workspace inaccuracy issues. Furthermore, trajectory generation based on conventional forward/inverse kinematics without taking into account inaccuracy issues will introduce the risk of end-effector fluctuations. Hence, we propose an optimization for the trajectory generation method based on the DDPG (Deep Deterministic Policy Gradient) algorithm for the low-cost redundant manipulators reaching the target position in Euclidean space. We designed the DDPG algorithm minimizing the distance along with the jacobian condition number. The training environment is selected with an error rate of randomly generated joint spaces in a simulator that implemented real-world physics, the test environment is a real robotic experiment and demonstrated our approach.

A Study on Load Distribution of Gaming Server Using Proximal Policy Optimization (Proximal Policy Optimization을 이용한 게임서버의 부하분산에 관한 연구)

  • Park, Jung-min;Kim, Hye-young;Cho, Sung Hyun
    • Journal of Korea Game Society
    • /
    • v.19 no.3
    • /
    • pp.5-14
    • /
    • 2019
  • The gaming server is based on a distributed server. In order to distribute workloads of gaming servers, distributed gaming servers apply some algorithms which divide each of gaming server's workload into balanced workload among the gaming servers and as a result, efficiently manage response time and fusibility of server requested by the clients. In this paper, we propose a load balancing agent using PPO(Proximal Policy Optimization) which is one of the methods from a greedy algorithm and Policy Gradient which is from Reinforcement Learning. The proposed load balancing agent is compared with the previous researches based on the simulation.

Experimental Analysis of A3C and PPO in the OpenAI Gym Environment (OpenAI Gym 환경에서 A3C와 PPO의 실험적 분석)

  • Hwang, Gyu-Young;Lim, Hyun-Kyo;Heo, Joo-Seong;Han, Youn-Hee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2019.05a
    • /
    • pp.545-547
    • /
    • 2019
  • Policy Gradient 방식의 학습은 최근 강화학습 분야에서 많이 연구되고 있는 주제로, 본 논문에서는 강화학습을 적용시킬 수 있는 OpenAi Gym 의 'CartPole-v0' 와 'Pendulum-v0' 환경에서 Policy Gradient 방식의 Asynchronous Advantage Actor-Critic (A3C) 알고리즘과 Proximal Policy Optimization (PPO) 알고리즘의 학습 성능을 비교 분석한 결과를 제시한다. 딥러닝 모델 등 두 알고리즘이 동일하게 지닐 수 있는 조건들은 가능한 동일하게 맞추면서 Episode 진행에 따른 Score 변화 과정을 실험하였다. 본 실험을 통해서 두 가지 서로 다른 환경에서 PPO 가 A3C 보다 더 나은 성능을 보임을 확인하였다.

Determination of Ship Collision Avoidance Path using Deep Deterministic Policy Gradient Algorithm (심층 결정론적 정책 경사법을 이용한 선박 충돌 회피 경로 결정)

  • Kim, Dong-Ham;Lee, Sung-Uk;Nam, Jong-Ho;Furukawa, Yoshitaka
    • Journal of the Society of Naval Architects of Korea
    • /
    • v.56 no.1
    • /
    • pp.58-65
    • /
    • 2019
  • The stability, reliability and efficiency of a smart ship are important issues as the interest in an autonomous ship has recently been high. An automatic collision avoidance system is an essential function of an autonomous ship. This system detects the possibility of collision and automatically takes avoidance actions in consideration of economy and safety. In order to construct an automatic collision avoidance system using reinforcement learning, in this work, the sequential decision problem of ship collision is mathematically formulated through a Markov Decision Process (MDP). A reinforcement learning environment is constructed based on the ship maneuvering equations, and then the three key components (state, action, and reward) of MDP are defined. The state uses parameters of the relationship between own-ship and target-ship, the action is the vertical distance away from the target course, and the reward is defined as a function considering safety and economics. In order to solve the sequential decision problem, the Deep Deterministic Policy Gradient (DDPG) algorithm which can express continuous action space and search an optimal action policy is utilized. The collision avoidance system is then tested assuming the $90^{\circ}$intersection encounter situation and yields a satisfactory result.

Prediction of Daphnia Production along a Trophic Gradient

  • Park, Sang-Kyu;Goldman, C.R.
    • Journal of Ecology and Environment
    • /
    • v.31 no.2
    • /
    • pp.125-129
    • /
    • 2008
  • To predict Daphnia secondary productivity along a trophic gradient indexed as total phosphorus (TP) concentration, we estimated energy transfer efficiencies from food quality for Daphnia such as eicosa-pentaenoic acid (EPA) or docosahexaenoic acid (DHA) content. Eleven flow-through Daphnia magna growth experiments were conducted with seston from 9 lakes, ponds and river waters. Primary productivities were estimated from food supply rates in the flow-through experiments, producing energy transfer efficiencies from seston to D. magna. We found DHA content was the best predictor of energy transfer efficiencies among the essential fatty acids. An asymptotic saturation model explained 79.6% of the variability In energy transfer efficiencies. Based on empirical data in this study and empirical models from literature, we predict that Daphnia productivity would peak in mesotrophic systems by decreasing food quality and Increasing food quantity along trophic gradient.

Hovering Control of 1-Axial Drone with Reinforcement Learning (강화학습을 이용한 1축 드론 수평 제어)

  • Lee, Taewoo;Ryu, Jinhoo;Park, Heemin
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.2
    • /
    • pp.250-260
    • /
    • 2018
  • In order to control the quadcopter using reinforcement learning, hovering of 1-axial drones prototype is implemented through reinforcement learning. A complementary filter is used to measure the correct angle, and the range of angles is from -180 degrees to +180 degrees using modified complementary filter. The policy gradient method is used together with the REINFORCE algorithm for reinforcement learning. The prototype learned in this way confirmed the difference in performance depending on the length of the episode.

Robot Locomotion via RLS-based Actor-Critic Learning (RLS 기반 Actor-Critic 학습을 이용한 로봇이동)

  • Kim, Jong-Ho;Kang, Dae-Sung;Park, Joo-Young
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2005.11a
    • /
    • pp.234-237
    • /
    • 2005
  • 강화학습을 위한 많은 방법 중 정책 반복을 이용한 actor-critic 학습 방법이 많은 적용 사례를 통해서 그 가능성을 인정받고 있다. Actor-critic 학습 방법은 제어입력 선택 전략을 위한 actor 학습과 가치 함수 근사를 위한 critic 학습이 필요하다. 본 논문은 critic의 학습을 위해 빠른 수렴성을 보장하는 RLS(recursive least square)를 사용하고, actor의 학습을 위해 정책의 기울기(policy gradient)를 이용하는 새로운 알고리즘을 제안하였다. 그리고 이를 실험적으로 확인하여 제안한 논문의 성능을 확인해 보았다.

  • PDF