• Title/Summary/Keyword: Markov Decision Processes

Search Result 19, Processing Time 0.024 seconds

Equivalent Transformations of Undiscounted Nonhomogeneous Markov Decision Processes

  • Park, Yun-Sun
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.17 no.2
    • /
    • pp.131-144
    • /
    • 1992
  • Even though nonhomogeneous Markov Decision Processes subsume homogeneous Markov Decision Processes and are more practical in the real world, there are many results for them. In this paper we address the nonhomogeneous Markov Decision Process with objective to maximize average reward. By extending works of Ross [17] in the homogeneous case adopting the result of Bean and Smith [3] for the dicounted deterministic problem, we first transform the original problem into the discounted nonhomogeneous Markov Decision Process. Then, secondly, we transform into the discounted deterministic problem. This approach not only shows the interrelationships between various problems but also attacks the solution method of the undiscounted nohomogeneous Markov Decision Process.

  • PDF

Partially Observable Markov Decision Processes (POMDPs) and Wireless Body Area Networks (WBAN): A Survey

  • Mohammed, Yahaya Onimisi;Baroudi, Uthman A.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.5
    • /
    • pp.1036-1057
    • /
    • 2013
  • Wireless body area network (WBAN) is a promising candidate for future health monitoring system. Nevertheless, the path to mature solutions is still facing a lot of challenges that need to be overcome. Energy efficient scheduling is one of these challenges given the scarcity of available energy of biosensors and the lack of portability. Therefore, researchers from academia, industry and health sectors are working together to realize practical solutions for these challenges. The main difficulty in WBAN is the uncertainty in the state of the monitored system. Intelligent learning approaches such as a Markov Decision Process (MDP) were proposed to tackle this issue. A Markov Decision Process (MDP) is a form of Markov Chain in which the transition matrix depends on the action taken by the decision maker (agent) at each time step. The agent receives a reward, which depends on the action and the state. The goal is to find a function, called a policy, which specifies which action to take in each state, so as to maximize some utility functions (e.g., the mean or expected discounted sum) of the sequence of rewards. A partially Observable Markov Decision Processes (POMDP) is a generalization of Markov decision processes that allows for the incomplete information regarding the state of the system. In this case, the state is not visible to the agent. This has many applications in operations research and artificial intelligence. Due to incomplete knowledge of the system, this uncertainty makes formulating and solving POMDP models mathematically complex and computationally expensive. Limited progress has been made in terms of applying POMPD to real applications. In this paper, we surveyed the existing methods and algorithms for solving POMDP in the general domain and in particular in Wireless body area network (WBAN). In addition, the papers discussed recent real implementation of POMDP on practical problems of WBAN. We believe that this work will provide valuable insights for the newcomers who would like to pursue related research in the domain of WBAN.

System Replacement Policy for A Partially Observable Markov Decision Process Model

  • Kim, Chang-Eun
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.16 no.2
    • /
    • pp.1-9
    • /
    • 1990
  • The control of deterioration processes for which only incomplete state information is available is examined in this study. When the deterioration is governed by a Markov process, such processes are known as Partially Observable Markov Decision Processes (POMDP) which eliminate the assumption that the state or level of deterioration of the system is known exactly. This research investigates a two state partially observable Markov chain in which only deterioration can occur and for which the only actions possible are to replace or to leave alone. The goal of this research is to develop a new jump algorithm which has the potential for solving system problems dealing with continuous state space Markov chains.

  • PDF

Localization and a Distributed Local Optimal Solution Algorithm for a Class of Multi-Agent Markov Decision Processes

  • Chang, Hyeong-Soo
    • International Journal of Control, Automation, and Systems
    • /
    • v.1 no.3
    • /
    • pp.358-367
    • /
    • 2003
  • We consider discrete-time factorial Markov Decision Processes (MDPs) in multiple decision-makers environment for infinite horizon average reward criterion with a general joint reward structure but a factorial joint state transition structure. We introduce the "localization" concept that a global MDP is localized for each agent such that each agent needs to consider a local MDP defined only with its own state and action spaces. Based on that, we present a gradient-ascent like iterative distributed algorithm that converges to a local optimal solution of the global MDP. The solution is an autonomous joint policy in that each agent's decision is based on only its local state.cal state.

A MARKOV DECISION PROCESSES FORMULATION FOR THE LINEAR SEARCH PROBLEM

  • Balkhi, Z.T.;Benkherouf, L.
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.19 no.1
    • /
    • pp.201-206
    • /
    • 1994
  • The linear search problem is concerned with finding a hiden target on the real line R. The position of the target governed by some probability distribution. It is desired to find the target in the least expected search time. This problem has been formulated as an optimization problem by a number of authors without making use of Markov Decision Process (MDP) theory. It is the aim of the paper to give a (MDP) formulation to the search problem which we feel is both natural and easy to follow.

  • PDF

A Localized Adaptive QoS Routing Scheme Using POMDP and Exploration Bonus Techniques (POMDP와 Exploration Bonus를 이용한 지역적이고 적응적인 QoS 라우팅 기법)

  • Han Jeong-Soo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.31 no.3B
    • /
    • pp.175-182
    • /
    • 2006
  • In this paper, we propose a Localized Adaptive QoS Routing Scheme using POMDP and Exploration Bonus Techniques. Also, this paper shows that CEA technique using expectation values can be simply POMDP problem, because performing dynamic programming to solve a POMDP is highly computationally expensive. And we use Exploration Bonus to search detour path better than current path. For this, we proposed the algorithm(SEMA) to search multiple path. Expecially, we evaluate performances of service success rate and average hop count with $\phi$ and k performance parameters, which is defined as exploration count and intervals. As result, we knew that the larger $\phi$, the better detour path search. And increasing n increased the amount of exploration.

Seamless Mobility of Heterogeneous Networks Based on Markov Decision Process

  • Preethi, G.A.;Chandrasekar, C.
    • Journal of Information Processing Systems
    • /
    • v.11 no.4
    • /
    • pp.616-629
    • /
    • 2015
  • A mobile terminal will expect a number of handoffs within its call duration. In the event of a mobile call, when a mobile node moves from one cell to another, it should connect to another access point within its range. In case there is a lack of support of its own network, it must changeover to another base station. In the event of moving on to another network, quality of service parameters need to be considered. In our study we have used the Markov decision process approach for a seamless handoff as it gives the optimum results for selecting a network when compared to other multiple attribute decision making processes. We have used the network cost function for selecting the network for handoff and the connection reward function, which is based on the values of the quality of service parameters. We have also examined the constant bit rate and transmission control protocol packet delivery ratio. We used the policy iteration algorithm for determining the optimal policy. Our enhanced handoff algorithm outperforms other previous multiple attribute decision making methods.

Rental Resource Management Model with Capacity Expansion and Return (용량 확장과 반납을 갖는 렌탈 자원 관리모델)

  • Kim Eun-Gab;Byun Jin-Ho
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.31 no.3
    • /
    • pp.81-96
    • /
    • 2006
  • We consider a rental company that dynamically manages Its capacity level through capacity addition and return While serving customer with its own capacity, the company expands its capacity by renting items from an outside source so that it can avoid lost opportunities of rental which occur when stock is not sufficient. If stock becomes sufficiently large enough to cope with demands, the company returns expanded capacity to the outside source. Formulating the model into a Markov decision problem, we identify an optimal capacity management Policy which states when the company should expand its capacity and when it should return expanded capacity after capacity addition. Since it is intractable to analytically find the optimal capacity management policy and the optimal size of capacity expansion, we present a numerical procedure that finds these optimal values based on the value iteration method. Numerical analysis is implemented and we observe monotonic properties of the optimal performance measures by system parameters, which are meaningful in developing effective heuristic policies.

Demand Variability Impact on the Replenishment Policy in a Two-Echelon Supply Chain Model (두 계층 공급사슬 모형에서 발주정책에 대한 수요 변동성 영향)

  • Kim Eungab
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.29 no.3
    • /
    • pp.111-127
    • /
    • 2004
  • We consider a supply chain model with a make-to-order production facility and a single supplier. The model we treat here is a special case of a two-echelon inventory model. Unlike classical two-echelon systems, the demand process at the supplier is affected by production process at the production facility as well as customer order arrival process. In this paper, we address that how the demand variability impacts on the optimal replenishment policy. To this end, we incorporate Erlang and phase-type demand distributions into the model. Formulating the model as a Markov decision problem, we investigate the structure of the optimal replenishment policy. We also implement a sensitivity analysis on the optimal policy and establish its monotonicity with respect to system cost parameters.

A Study of Adaptive QoS Routing scheme using Policy-gradient Reinforcement Learning (정책 기울기 값 강화학습을 이용한 적응적인 QoS 라우팅 기법 연구)

  • Han, Jeong-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.2
    • /
    • pp.93-99
    • /
    • 2011
  • In this paper, we propose a policy-gradient routing scheme under Reinforcement Learning that can be used adaptive QoS routing. A policy-gradient RL routing can provide fast learning of network environments as using optimal policy adapted average estimate rewards gradient values. This technique shows that fast of learning network environments results in high success rate of routing. For prove it, we simulate and compare with three different schemes.