Open Access. Powered by Scholars. Published by Universities.®

Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Theses/Dissertations

Artificial Intelligence and Robotics

Reinforcement Learning

Masters Theses

Articles 1 - 2 of 2

Full-Text Articles in Engineering

A New Reinforcement Learning Algorithm With Fixed Exploration For Semi-Markov Decision Processes, Angelo Michael Encapera Jan 2017

A New Reinforcement Learning Algorithm With Fixed Exploration For Semi-Markov Decision Processes, Angelo Michael Encapera

Masters Theses

"Artificial intelligence or machine learning techniques are currently being widely applied for solving problems within the field of data analytics. This work presents and demonstrates the use of a new machine learning algorithm for solving semi-Markov decision processes (SMDPs). SMDPs are encountered in the domain of Reinforcement Learning to solve control problems in discrete-event systems. The new algorithm developed here is called iSMART, an acronym for imaging Semi-Markov Average Reward Technique. The algorithm uses a constant exploration rate, unlike its precursor R-SMART, which required exploration decay. The major difference between R-SMART and iSMART is that the latter uses, in addition ...


A Bounded Actor-Critic Algorithm For Reinforcement Learning, Ryan Jacob Lawhead Jan 2017

A Bounded Actor-Critic Algorithm For Reinforcement Learning, Ryan Jacob Lawhead

Masters Theses

"This thesis presents a new actor-critic algorithm from the domain of reinforcement learning to solve Markov and semi-Markov decision processes (or problems) in the field of airline revenue management (ARM). The ARM problem is one of control optimization in which a decision-maker must accept or reject a customer based on a requested fare. This thesis focuses on the so-called single-leg version of the ARM problem, which can be cast as a semi-Markov decision process (SMDP). Large-scale Markov decision processes (MDPs) and SMDPs suffer from the curses of dimensionality and modeling, making it difficult to create the transition probability matrices (TPMs ...