• Title/Summary/Keyword: Reward Function

Search Result 93, Processing Time 0.026 seconds

An Improved DSA Strategy based on Triple-States Reward Function (Triple-state 보상 함수를 기반으로 한 개선된 DSA 기법)

  • Ahmed, Tasmia;Gu, Jun-Rong;Jang, Sung-Jeen;Kim, Jae-Moung
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.47 no.11
    • /
    • pp.59-68
    • /
    • 2010
  • In this paper, we present a new method to complete Dynamic Spectrum Access by modifying the reward function. Partially Observable Markov Decision Process (POMDP) is an eligible algorithm to predict the upcoming spectrum opportunity. In POMDP, Reward function is the last portion and very important for prediction. However, the Reward function has only two states (Busy and Idle). When collision happens in the channel, reward function indicates busy state which is responsible for the throughput decreasing of secondary user. In this paper, we focus the difference between busy and collision state. We have proposed a new algorithm for reward function that indicates an additional state of collision which brings better communication opportunity for secondary users. Secondary users properly utilize opportunities to access Primary User channels for efficient data transmission with the help of the new reward function. We have derived mathematical belief vector of the new algorithm as well. Simulation results have corroborated the superior performance of improved reward function. The new algorithm has increased the throughput for secondary user in cognitive radio network.

Visual Object Manipulation Based on Exploration Guided by Demonstration (시연에 의해 유도된 탐험을 통한 시각 기반의 물체 조작)

  • Kim, Doo-Jun;Jo, HyunJun;Song, Jae-Bok
    • The Journal of Korea Robotics Society
    • /
    • v.17 no.1
    • /
    • pp.40-47
    • /
    • 2022
  • A reward function suitable for a task is required to manipulate objects through reinforcement learning. However, it is difficult to design the reward function if the ample information of the objects cannot be obtained. In this study, a demonstration-based object manipulation algorithm called stochastic exploration guided by demonstration (SEGD) is proposed to solve the design problem of the reward function. SEGD is a reinforcement learning algorithm in which a sparse reward explorer (SRE) and an interpolated policy using demonstration (IPD) are added to soft actor-critic (SAC). SRE ensures the training of the critic of SAC by collecting prior data and IPD limits the exploration space by making SEGD's action similar to the expert's action. Through these two algorithms, the SEGD can learn only with the sparse reward of the task without designing the reward function. In order to verify the SEGD, experiments were conducted for three tasks. SEGD showed its effectiveness by showing success rates of more than 96.5% in these experiments.

On the Reward Function of Latent SAC Reinforcement Learning to Improve Longitudinal Driving Performance (종방향 주행성능향상을 위한 Latent SAC 강화학습 보상함수 설계)

  • Jo, Sung-Bean;Jeong, Han-You
    • Journal of IKEEE
    • /
    • v.25 no.4
    • /
    • pp.728-734
    • /
    • 2021
  • In recent years, there has been a strong interest in the end-to-end autonomous driving based on deep reinforcement learning. In this paper, we present a reward function of latent SAC deep reinforcement learning to improve the longitudinal driving performance of an agent vehicle. While the existing reward function significantly degrades the driving safety and efficiency, the proposed reward function is shown to maintain an appropriate headway distance while avoiding the front vehicle collision.

Designing Reward Function for Cooperative Traffic Signal Control at Multi-intersection (다중 교차로에서 협동적 신호제어를 위한 보상함수 설계)

  • Bae, Yo-han;Jang, Jin-heon;Song, Moon-hyuk
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.110-113
    • /
    • 2022
  • Nowadays, breaking through the conventional traffic signal control method based on mathematical optimization, artificial intelligence began to be used in the area. In response to this trend, many studies are ongoing to figure out how to utilize AI technology properly for traffic signal optimization. They just simply focus on which method will work well besides lots of machine learning techniques and abandon the reward function engineering. In many cases, the reward function consists of the average delay of the vehicles in the intersection. However, this may lead to AI's misunderstanding about the traffic signal control: what AI regards as a good situation may not be realistic. Even the reward function itself may not meet the service level. Therefore, this study analyzes the problems of previous reward functions and will suggest how to reward function can be enhanced.

  • PDF

Comparative analysis of activation functions within reinforcement learning for autonomous vehicles merging onto highways

  • Dongcheul Lee;Janise McNair
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.16 no.1
    • /
    • pp.63-71
    • /
    • 2024
  • Deep reinforcement learning (RL) significantly influences autonomous vehicle development by optimizing decision-making and adaptation to complex driving environments through simulation-based training. In deep RL, an activation function is used, and various activation functions have been proposed, but their performance varies greatly depending on the application environment. Therefore, finding the optimal activation function according to the environment is important for effective learning. In this paper, we analyzed nine commonly used activation functions for RL to compare and evaluate which activation function is most effective when using deep RL for autonomous vehicles to learn highway merging. To do this, we built a performance evaluation environment and compared the average reward of each activation function. The results showed that the highest reward was achieved using Mish, and the lowest using SELU. The difference in reward between the two activation functions was 10.3%.

Weight Adjustment Scheme Based on Hop Count in Q-routing for Software Defined Networks-enabled Wireless Sensor Networks

  • Godfrey, Daniel;Jang, Jinsoo;Kim, Ki-Il
    • Journal of information and communication convergence engineering
    • /
    • v.20 no.1
    • /
    • pp.22-30
    • /
    • 2022
  • The reinforcement learning algorithm has proven its potential in solving sequential decision-making problems under uncertainties, such as finding paths to route data packets in wireless sensor networks. With reinforcement learning, the computation of the optimum path requires careful definition of the so-called reward function, which is defined as a linear function that aggregates multiple objective functions into a single objective to compute a numerical value (reward) to be maximized. In a typical defined linear reward function, the multiple objectives to be optimized are integrated in the form of a weighted sum with fixed weighting factors for all learning agents. This study proposes a reinforcement learning -based routing protocol for wireless sensor network, where different learning agents prioritize different objective goals by assigning weighting factors to the aggregated objectives of the reward function. We assign appropriate weighting factors to the objectives in the reward function of a sensor node according to its hop-count distance to the sink node. We expect this approach to enhance the effectiveness of multi-objective reinforcement learning for wireless sensor networks with a balanced trade-off among competing parameters. Furthermore, we propose SDN (Software Defined Networks) architecture with multiple controllers for constant network monitoring to allow learning agents to adapt according to the dynamics of the network conditions. Simulation results show that our proposed scheme enhances the performance of wireless sensor network under varied conditions, such as the node density and traffic intensity, with a good trade-off among competing performance metrics.

Reward Shaping for a Reinforcement Learning Method-Based Navigation Framework

  • Roland, Cubahiro;Choi, Donggyu;Jang, Jongwook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.9-11
    • /
    • 2022
  • Applying Reinforcement Learning in everyday applications and varied environments has proved the potential of the of the field and revealed pitfalls along the way. In robotics, a learning agent takes over gradually the control of a robot by abstracting the navigation model of the robot with its inputs and outputs, thus reducing the human intervention. The challenge for the agent is how to implement a feedback function that facilitates the learning process of an MDP problem in an environment while reducing the time of convergence for the method. In this paper we will implement a reward shaping system avoiding sparse rewards which gives fewer data for the learning agent in a ROS environment. Reward shaping prioritizes behaviours that brings the robot closer to the goal by giving intermediate rewards and helps the algorithm converge quickly. We will use a pseudocode implementation as an illustration of the method.

  • PDF

The Effect of the Complex Reward in STAD Learning on Academic Achievement and Learning Attitudes (STAD학습에서 복합보상이 학업성취도와 학습태도에 미치는 효과)

  • 김선수;최도성
    • Journal of Korean Elementary Science Education
    • /
    • v.21 no.1
    • /
    • pp.101-109
    • /
    • 2002
  • A cooperative teaming has been taken to consolidate the autonomous motivation of students and to develop a desirable attitude in a mutual cooperative atmosphere. Some studies on the reward effect showed that the reward after the evaluation, in the processes of cooperative learning, worked on students' learning motive directly, and the group reward was effective in learning attitude and the individual reward in academic achievement, respectively. Assuming that the group reward and the individual reward are organized and applied as a complex reward, the effects of rewards will appear, this study examined the effect of the complex reward on academic achievement and teaming attitude. For this study. 2 classes were randomly selected out of a elementary school in Gwangju and the teaming unit was based on chapter 4「The structure and function of plants」 in the 5-1 elementary Science textbook. This research has been done for 4 weeks after the students learned STAD for 8 weeks previously. The learning attitude was examined in pre and post tests, and the academic achievement was inspected twice at 2-week intervals after the pre test. The results were analysized by the SAS program In the case of academic achievement, both groups showed a significant improvement(p<.05). The experimental group showed no significant improvement in the first test, compared with the control group(p>.05), but after 4 weeks, it showed a significant improvement in the second test, compared with the control group(p<.05). From this result, it is identified that the reward should be done for a long time and the individual reward of the complex reward is successful in improving academic achievement. However, in the case of learning attitude, there was no meaningful difference in both groups(p>.05). But the control group showed a significant improvement, compared with the experimental group(p<.05). According to this result, it is indicated that the group reward only is more effective in improving learning attitude and complex reward can decrease the individual competition in experimental group.

  • PDF

Exploring reward efficacy in traffic management using deep reinforcement learning in intelligent transportation system

  • Paul, Ananya;Mitra, Sulata
    • ETRI Journal
    • /
    • v.44 no.2
    • /
    • pp.194-207
    • /
    • 2022
  • In the last decade, substantial progress has been achieved in intelligent traffic control technologies to overcome consistent difficulties of traffic congestion and its adverse effect on smart cities. Edge computing is one such advanced progress facilitating real-time data transmission among vehicles and roadside units to mitigate congestion. An edge computing-based deep reinforcement learning system is demonstrated in this study that appropriately designs a multiobjective reward function for optimizing different objectives. The system seeks to overcome the challenge of evaluating actions with a simple numerical reward. The selection of reward functions has a significant impact on agents' ability to acquire the ideal behavior for managing multiple traffic signals in a large-scale road network. To ascertain effective reward functions, the agent is trained withusing the proximal policy optimization method in several deep neural network models, including the state-of-the-art transformer network. The system is verified using both hypothetical scenarios and real-world traffic maps. The comprehensive simulation outcomes demonstrate the potency of the suggested reward functions.

Novel Reward Function for Autonomous Drone Navigating in Indoor Environment

  • Khuong G. T. Diep;Viet-Tuan Le;Tae-Seok Kim;Anh H. Vo;Yong-Guk Kim
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.624-627
    • /
    • 2023
  • Unmanned aerial vehicles are gaining in popularity with the development of science and technology, and are being used for a wide range of purposes, including surveillance, rescue, delivery of goods, and data collection. In particular, the ability to avoid obstacles during navigation without human oversight is one of the essential capabilities that a drone must possess. Many works currently have solved this problem by implementing deep reinforcement learning (DRL) model. The essential core of a DRL model is reward function. Therefore, this paper proposes a new reward function with appropriate action space and employs dueling double deep Q-Networks to train a drone to navigate in indoor environment without collision.