Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

PDF

Theses/Dissertations

Reinforcement learning

Discipline
Institution
Publication Year
Publication

Articles 1 - 30 of 63

Full-Text Articles in Physical Sciences and Mathematics

Reinforcement Learning: Applying Low Discrepancy Action Selection To Deep Deterministic Policy Gradient, Aleksandr Svishchev Jan 2024

Reinforcement Learning: Applying Low Discrepancy Action Selection To Deep Deterministic Policy Gradient, Aleksandr Svishchev

Electronic Theses and Dissertations

Reinforcement learning (RL) is a subfield of machine learning concerned with agents learning to behave optimally by interacting with an environment. One of the most important topics in RL is how the agent should explore, that is, how to choose actions in order to rate their impact on long-term reward. For example, a simple baseline strategy might be uniformly random action selection. This thesis investigates the heuristic idea that agents will learn faster if they explore by factoring the environment’s state into their decision and intentionally choose actions which are as different as possible from what they have previously observed. …


Task Distillation: Transforming Reinforcement Learning Into Supervised Learning, Connor Wilhelm Oct 2023

Task Distillation: Transforming Reinforcement Learning Into Supervised Learning, Connor Wilhelm

Theses and Dissertations

Recent work in dataset distillation focuses on distilling supervised classification datasets into smaller, synthetic supervised datasets in order to reduce per-model costs of training, to provide interpretability, and to anonymize data. Distillation and its benefits can be extended to a wider array of tasks. We propose a generalization of dataset distillation, which we call task distillation. Using techniques similar to those used in dataset distillation, any learning task can be distilled into a compressed synthetic task. Task distillation allows for transmodal distillations, where a task of one modality is distilled into a synthetic task of another modality, allowing a more …


Insights Into The Application Of Deep Reinforcement Learning In Healthcare And Materials Science, Benjamin R. Smith Aug 2023

Insights Into The Application Of Deep Reinforcement Learning In Healthcare And Materials Science, Benjamin R. Smith

Doctoral Dissertations

Reinforcement learning (RL) is a type of machine learning designed to optimize sequential decision-making. While controlled environments have served as a foundation for RL research, due to the growth in data volumes and deep learning methods, it is now increasingly being applied to real-world problems. In our work, we explore and attempt to overcome challenges that occur when applying RL to solve problems in healthcare and materials science.

First, we explore how issues in bias and data completeness affect healthcare applications of RL. To understand how bias has already been considered in this area, we survey the literature for existing …


A Machine Learning Approach To Constructing Ramsey Graphs Leads To The Trahtenbrot-Zykov Problem., Emily Hawboldt Aug 2023

A Machine Learning Approach To Constructing Ramsey Graphs Leads To The Trahtenbrot-Zykov Problem., Emily Hawboldt

Electronic Theses and Dissertations

Attempts at approaching the well-known and difficult problem of constructing Ramsey graphs via machine learning lead to another difficult problem posed by Zykov in 1963 (now commonly referred to as the Trahtenbrot-Zykov problem): For which graphs F does there exist some graph G such that the neighborhood of every vertex in G induces a subgraph isomorphic to F? Chapter 1 provides a brief introduction to graph theory. Chapter 2 introduces Ramsey theory for graphs. Chapter 3 details a reinforcement learning implementation for Ramsey graph construction. The implementation is based on board game software, specifically the AlphaZero program and its …


Reinforcement Learning For Sequential Decision Making With Constraints, Jiajing Ling Jul 2023

Reinforcement Learning For Sequential Decision Making With Constraints, Jiajing Ling

Dissertations and Theses Collection (Open Access)

Reinforcement learning is a widely used approach to tackle problems in sequential decision making where an agent learns from rewards or penalties. However, in decision-making problems that involve safety or limited resources, the agent's exploration is often limited by constraints. To model such problems, constrained Markov decision processes and constrained decentralized partially observable Markov decision processes have been proposed for single-agent and multi-agent settings, respectively. A significant challenge in solving constrained Dec-POMDP is determining the contribution of each agent to the primary objective and constraint violations. To address this issue, we propose a fictitious play-based method that uses Lagrangian Relaxation …


An Investigation Into Machine Learning Techniques For Designing Dynamic Difficulty Agents In Real-Time Games, Ryan Adare Dunagan Jun 2023

An Investigation Into Machine Learning Techniques For Designing Dynamic Difficulty Agents In Real-Time Games, Ryan Adare Dunagan

Electronic Theses and Dissertations

Video games are an incredibly popular pastime enjoyed by people of all ages world wide. Many different kinds of games exist, but most games feature some elements of the player overcoming some challenge, usually through gameplay. These challenges are insurmountable for some people and may turn them off to video games as a pastime. Games can be made more accessible to players of little skill and/or experience through the use of Dynamic Difficulty Adjustment (DDA) systems that adjust the difficulty of the game in response to the player’s performance. This research seeks to establish the effectiveness of machine learning techniques …


Detecting Complex Cyber Attacks Using Decoys With Online Reinforcement Learning, Marcus Gutierrez May 2023

Detecting Complex Cyber Attacks Using Decoys With Online Reinforcement Learning, Marcus Gutierrez

Open Access Theses & Dissertations

Most vulnerabilities discovered in cybersecurity can be associated with their own singular piece of software. I investigate complex vulnerabilities, which may require multiple software to be present. These complex vulnerabilities represent 16.6% of all documented vulnerabilities and are more dangerous on average than their simple vulnerability counterparts. In addition to this, because they often require multiple pieces of software to be present, they are harder to identify overall as specific combinations are needed for the vulnerability to appear.

I consider the motivating scenario where an attacker is repeatedly deploying exploits that use complex vulnerabilities into an Airport Wi-Fi. The network …


Optimizing Constraint Selection In A Design Verification Environment For Efficient Coverage Closure, Vanessa Cooper Jan 2023

Optimizing Constraint Selection In A Design Verification Environment For Efficient Coverage Closure, Vanessa Cooper

CCE Theses and Dissertations

No abstract provided.


Navigating Classic Atari Games With Deep Learning, Ayan Abhiranya Singh Jan 2023

Navigating Classic Atari Games With Deep Learning, Ayan Abhiranya Singh

Master's Projects

Games for the Atari 2600 console provide great environments for testing reinforcement learning algorithms. In reinforcement learning algorithms, an agent typically learns about its environment via the delivery of periodic rewards. Deep Q-Learning, a variant of Q-Learning, utilizes neural networks which train a Q-function to predict the highest future reward given an input state and action. Deep Q-learning has shown great results in training agents to play Atari 2600 games like Space Invaders and Breakout. However, Deep Q-Learning has historically struggled with learning how to play games with greater emphasis on exploration and delayed rewards, like Ms. PacMan. In this …


The Basil Technique: Bias Adaptive Statistical Inference Learning Agents For Learning From Human Feedback, Jonathan Indigo Watson Jan 2023

The Basil Technique: Bias Adaptive Statistical Inference Learning Agents For Learning From Human Feedback, Jonathan Indigo Watson

Theses and Dissertations--Computer Science

We introduce a novel approach for learning behaviors using human-provided feedback that is subject to systematic bias. Our method, known as BASIL, models the feedback signal as a combination of a heuristic evaluation of an action's utility and a probabilistically-drawn bias value, characterized by unknown parameters. We present both the general framework for our technique and specific algorithms for biases drawn from a normal distribution. We evaluate our approach across various environments and tasks, comparing it to interactive and non-interactive machine learning methods, including deep learning techniques, using human trainers and a synthetic oracle with feedback distorted to varying degrees. …


Low-Reynolds-Number Locomotion Via Reinforcement Learning, Yuexin Liu Aug 2022

Low-Reynolds-Number Locomotion Via Reinforcement Learning, Yuexin Liu

Dissertations

This dissertation summarizes computational results from applying reinforcement learning and deep neural network to the designs of artificial microswimmers in the inertialess regime, where the viscous dissipation in the surrounding fluid environment dominates and the swimmer’s inertia is completely negligible. In particular, works in this dissertation consist of four interrelated studies of the design of microswimmers for different tasks: (1) a one-dimensional microswimmer in free-space that moves towards the target via translation, (2) a one-dimensional microswimmer in a periodic domain that rotates to reach the target, (3) a two-dimensional microswimmer that switches gaits to navigate to the designated targets in …


Team Air Combat Using Model-Based Reinforcement Learning, David A. Mottice Mar 2022

Team Air Combat Using Model-Based Reinforcement Learning, David A. Mottice

Theses and Dissertations

We formulate the first generalized air combat maneuvering problem (ACMP), called the MvN ACMP, wherein M friendly AUCAVs engage against N enemy AUCAVs, developing a Markov decision process (MDP) model to control the team of M Blue AUCAVs. The MDP model leverages a 5-degree-of-freedom aircraft state transition model and formulates a directed energy weapon capability. Instead, a model-based reinforcement learning approach is adopted wherein an approximate policy iteration algorithmic strategy is implemented to attain high-quality approximate policies relative to a high performing benchmark policy. The ADP algorithm utilizes a multi-layer neural network for the value function approximation regression mechanism. One-versus-one …


Deep Reinforcement Learning For Open Multiagent System, Tianxing Zhu Jan 2022

Deep Reinforcement Learning For Open Multiagent System, Tianxing Zhu

Honors Papers

In open multiagent systems, multiple agents work together or compete to reach the goal while members of the group change over time. For example, intelligent robots that are collaborating to put out wildfires may run out of suppressants and have to leave the place to recharge; the rest of the robots may need to change their behaviors accordingly to better control the fires. Thus, openness requires agents not only to predict the behaviors of others, but also the presence of other agents. We present a deep reinforcement learning method that adapts the proximal policy optimization algorithm to learn the optimal …


Multi-Step Prediction Using Tree Generation For Reinforcement Learning, Kevin Prakash Jan 2022

Multi-Step Prediction Using Tree Generation For Reinforcement Learning, Kevin Prakash

Master's Projects

The goal of reinforcement learning is to learn a policy that maximizes a reward function. In some environments with complete information, search algorithms are highly useful in simulating action sequences in a game tree. However, in many practical environments, such effective search strategies are not applicable since their state transition information may not be available. This paper proposes a novel method to approximate a game tree that enables reinforcement learning to use search strategies even in incomplete information environments. With an approximated game tree, the agent predicts all possible states multiple steps into the future and evaluates the states to …


Cloud Provisioning And Management With Deep Reinforcement Learning, Alexandru Tol Jan 2022

Cloud Provisioning And Management With Deep Reinforcement Learning, Alexandru Tol

Master's Projects

The first web applications appeared in the early nineteen nineties. These applica- tions were entirely hosted in house by companies that developed them. In the mid 2000s the concept of a digital cloud was introduced by the then CEO of google Eric Schmidt. Now in the current day most companies will at least partially host their applications on proprietary servers hosted at data-centers or commercial clouds like Amazon Web Services (AWS) or Heroku.

This arrangement seems like a straight forward win-win for both parties, the customer gets rid of the hassle of maintaining a live server for their applications and …


Intelligent Traffic Management: From Practical Stochastic Path Planning To Reinforcement Learning Based City-Wide Traffic Optimization, Kamilia Ahmadi Dec 2021

Intelligent Traffic Management: From Practical Stochastic Path Planning To Reinforcement Learning Based City-Wide Traffic Optimization, Kamilia Ahmadi

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

This research focuses on intelligent traffic management including stochastic path planning and city scale traffic optimization. Stochastic path planning focuses on finding paths when edge weights are not fixed and change depending on the time of day/week. Then we focus on minimizing the running time of the overall procedure at query time utilizing precomputation and approximation. The city graph is partitioned into smaller groups of nodes and represented by its exemplar. In query time, source and destination pairs are connected to their respective exemplars and the path between those exemplars is found. After this, we move toward minimizing the city …


From Mdp To Alphazero, David Robert Sewell Nov 2021

From Mdp To Alphazero, David Robert Sewell

Dissertations and Theses

In this paper I will explain the AlphaGo family of algorithms starting from first principles and requiring little previous knowledge from the reader. The focus will be upon one of the more recent versions AlphaZero but I hope to explain the core principles that allowed these algorithms to be so successful. I will generally refer to AlphaZero as theses [sic] core set of principles and will make it clear when I am referring to a specific algorithm of the AlphaGo family. AlphaZero in short combines Monte Carlo Tree Search (MCTS) with Deep learning and self-play. We will see how these …


Comparative Study Of Reinforcement Learning Methods In Path Planning, Daniel Obawole Oct 2021

Comparative Study Of Reinforcement Learning Methods In Path Planning, Daniel Obawole

Electronic Theses and Dissertations

In order to perform a large variety of tasks and achieve human-level performance in complex real-world environments, an intelligent agent must be able to learn from its dynamically changing environment. Generally speaking, agents have limitations in obtaining an accurate description of the environment from what they perceive because they may not have all the information about the environment. The present research is focused on reinforcement learning algorithms that represent a defined category in the field of machine learning because of their unique approach based on a trial-error basis. Reinforcement learning is used to solve control problems based on received rewards. …


High-Density Parking For Autonomous Vehicles., Parag J. Siddique Aug 2021

High-Density Parking For Autonomous Vehicles., Parag J. Siddique

Electronic Theses and Dissertations

In a common parking lot, much of the space is devoted to lanes. Lanes must not be blocked for one simple reason: a blocked car might need to leave before the car that blocks it. However, the advent of autonomous vehicles gives us an opportunity to overcome this constraint, and to achieve a higher storage capacity of cars. Taking advantage of self-parking and intelligent communication systems of autonomous vehicles, we propose puzzle-based parking, a high-density design for a parking lot. We introduce a novel method of vehicle parking, which leads to maximum parking density. We then propose a heuristic method …


Identification Of Chemical Structures And Substructures Via Deep Q-Learning And Supervised Learning Of Ftir Spectra, Joshua D. Ellis Aug 2021

Identification Of Chemical Structures And Substructures Via Deep Q-Learning And Supervised Learning Of Ftir Spectra, Joshua D. Ellis

MSU Graduate Theses

Fourier-transform infrared (FTIR) spectra of organic compounds can be used to compare and identify compounds. A mid-FTIR spectrum gives absorbance values of a compound over the 400-4000 cm-1 range. Spectral matching is the process of comparing the spectral signature of two or more compounds and returning a value for the similarity of the compounds based on how closely their spectra match. This process is commonly used to identify an unknown compound by searching for its spectrum’s closes match in a database of known spectra. A major limitation of this process is that it can only be used to identify …


Reinforcement Learning With Auxiliary Memory, Sterling Suggs Jun 2021

Reinforcement Learning With Auxiliary Memory, Sterling Suggs

Theses and Dissertations

Deep reinforcement learning algorithms typically require vast amounts of data to train to a useful level of performance. Each time new data is encountered, the network must inefficiently update all of its parameters. Auxiliary memory units can help deep neural networks train more efficiently by separating computation from storage, and providing a means to rapidly store and retrieve precise information. We present four deep reinforcement learning models augmented with external memory, and benchmark their performance on ten tasks from the Arcade Learning Environment. Our discussion and insights will be helpful for future RL researchers developing their own memory agents.


A Reinforcement Learning Approach To Vehicle Path Optimization In Urban Environments, Shamsa Abdulla Al Hassani Jun 2021

A Reinforcement Learning Approach To Vehicle Path Optimization In Urban Environments, Shamsa Abdulla Al Hassani

Theses

Road traffic management in metropolitan cities and urban areas, in general, is an important component of Intelligent Transportation Systems (ITS). With the increasing number of world population and vehicles, a dramatic increase in road traffic is expected to put pressure on the transportation infrastructure. Therefore, there is a pressing need to devise new ways to optimize the traffic flow in order to accommodate the growing needs of transportation systems. This work proposes to use an Artificial Intelligent (AI) method based on reinforcement learning techniques for computing near-optimal vehicle itineraries applied to Vehicular Ad-hoc Networks (VANETs). These itineraries are optimized based …


Learning How To Search: Generating Effective Test Cases Through Adaptive Fitness Function Selection, Hussein Khalid Almulla Apr 2021

Learning How To Search: Generating Effective Test Cases Through Adaptive Fitness Function Selection, Hussein Khalid Almulla

Theses and Dissertations

Search-based test generation is guided by feedback from one or more fitness functions— scoring functions that judge solution optimality. Choosing informative fitness functions is crucial to meeting the goals of a tester. Unfortunately, many goals—such as forcing the class-under-test to throw exceptions, increasing test suite diversity, and attaining Strong Mutation Coverage—do not have effective fitness function formulations. We propose that meeting such goals requires treating fitness function identification as a secondary optimization step. An adaptive algorithm that can vary the selection of fitness functions could adjust its selection throughout the generation process to maximize goal attainment, based on the current …


Load Balancing And Resource Allocation In Smart Cities Using Reinforcement Learning, Aseel Alorbani Feb 2021

Load Balancing And Resource Allocation In Smart Cities Using Reinforcement Learning, Aseel Alorbani

Electronic Thesis and Dissertation Repository

Today, smart city technology is being adopted by many municipal governments to improve their services and to adapt to growing and changing urban population. Implementing a smart city application can be one of the most challenging projects due to the complexity, requirements and constraints. Sensing devices and computing components can be numerous and heterogeneous. Increasingly, researchers working in the smart city arena are looking to leverage edge and cloud computing to support smart city development. This approach also brings a number of challenges. Two of the main challenges are resource allocation and load balancing of tasks associated with processing data …


Increasing Software Reliability Using Mutation Testing And Machine Learning, Michael Allen Stewart Jan 2021

Increasing Software Reliability Using Mutation Testing And Machine Learning, Michael Allen Stewart

CCE Theses and Dissertations

Mutation testing is a type of software testing proposed in the 1970s where program statements are deliberately changed to introduce simple errors so that test cases can be validated to determine if they can detect the errors. The goal of mutation testing was to reduce complex program errors by preventing the related simple errors. Test cases are executed against the mutant code to determine if one fails, detects the error and ensures the program is correct. One major issue with this type of testing was it became intensive computationally to generate and test all possible mutations for complex programs.

This …


Markov Decision Processes With Embedded Agents, Luke Harold Miles Jan 2021

Markov Decision Processes With Embedded Agents, Luke Harold Miles

Theses and Dissertations--Computer Science

We present Markov Decision Processes with Embedded Agents (MDPEAs), an extension of multi-agent POMDPs that allow for the modeling of environments that can change the actuators, sensors, and learning function of the agent, e.g., a household robot which could gain and lose hardware from its frame, or a sovereign software agent which could encounter viruses on computers that modify its code. We show several toy problems for which standard reinforcement-learning methods fail to converge, and give an algorithm, `just-copy-it`, which learns some of them. Unlike MDPs, MDPEAs are closed systems and hence their evolution over time can be treated as …


Playing Pong Using Q-Learning, Akash Kumar Jan 2021

Playing Pong Using Q-Learning, Akash Kumar

West Chester University Master’s Theses

This thesis involves the use of a reinforcement learning algorithm (RL) called Q-learning to train a Q-agent to play a game of Pong against a near-perfect opponent. Compared to previously related work which trained Pong RL agents by combining Q-learning with deep learning in an algorithm known as Deep Q-Networks, the work presented in this paper takes advantage of known environment constraints of the custom-made Pong environment to train the agent using one-step Q-learning alone. In addition, the thesis explores ways of making the Q-learning more efficient by converting Markov Decision Processes (MDPs) to Partially Observable Markov Decision Processes (POMDPs), …


Micro Grid Control Optimization With Load And Solar Prediction, Shaju Saha Dec 2020

Micro Grid Control Optimization With Load And Solar Prediction, Shaju Saha

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Using renewable energy can save money and keep the environment cleaner. Installing a solar PV system is a one-time cost but it can generate energy for a lifetime. Solar PV does not generate carbon emissions while producing power. This thesis evaluates the value of being able to make accurate predictions in the use of solar energy. It uses predicted solar power and load for a system and a battery to store the energy for future use and calculates the operating cost or profit in several designed conditions. Various factors like a different place, tuning the capacity of sources, changing buy/sell …


Deep Q Learning Applied To Stock Trading, Agnibh Dasgupta Dec 2020

Deep Q Learning Applied To Stock Trading, Agnibh Dasgupta

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Developing a strategy for stock trading is a vital task for investors. However, it is challenging to obtain an optimal strategy, given the complex and dynamic nature of the stock market. This thesis aims to explore the applications of Reinforcement Learning with the goal of maximizing returns from market investment, keeping in mind the human aspect of trading by utilizing stock prices represented as candlestick graphs. Furthermore, the algorithm studies public interest patterns in form of graphs extracted from Google Trends to make predictions. Deep Q learning has been used to train an agent based on fused images of stock …


Reinforcement Learning Environment For Orbital Station-Keeping, Armando Herrera Iii Dec 2020

Reinforcement Learning Environment For Orbital Station-Keeping, Armando Herrera Iii

Theses and Dissertations

In this thesis, a Reinforcement Learning Environment for orbital station-keeping is created and tested against one of the most used Reinforcement Learning algorithm called Proximal Policy Optimization (PPO). This thesis also explores the foundations of Reinforcement Learning, from the taxonomy to a description of PPO, and shows a thorough explanation of the physics required to make the RL environment. Optuna optimizes PPO's hyper-parameters for the created environment via distributed computing. This thesis then shows and analysis the results from training a PPO agent six times.