Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Artificial Intelligence and Robotics

PDF

Reinforcement Learning

Institution
Publication Year
Publication
Publication Type

Articles 1 - 30 of 52

Full-Text Articles in Entire DC Network

Policy Gradient Methods: Analysis, Misconceptions, And Improvements, Christopher P. Nota Mar 2024

Policy Gradient Methods: Analysis, Misconceptions, And Improvements, Christopher P. Nota

Doctoral Dissertations

Policy gradient methods are a class of reinforcement learning algorithms that optimize a parametric policy by maximizing an objective function that directly measures the performance of the policy. Despite being used in many high-profile applications of reinforcement learning, the conventional use of policy gradient methods in practice deviates from existing theory. This thesis presents a comprehensive mathematical analysis of policy gradient methods, uncovering misconceptions and suggesting novel solutions to improve their performance. We first demonstrate that the update rule used by most policy gradient methods does not correspond to the gradient of any objective function due to the way the …


Smart Applications And Resource Management In Internet Of Things, Zeinab Akhavan Dec 2023

Smart Applications And Resource Management In Internet Of Things, Zeinab Akhavan

Computer Science ETDs

Internet of Things (IoT) technologies are currently the principal solutions driving smart cities. These new technologies such as Cyber Physical Systems, 5G and data analytic have emerged to address various cities' infrastructure issues ranging from transportation and energy management to healthcare systems. An IoT setting primarily consists of a wide range of users and devices as a massive network interacting with different layers of the city infrastructure resulting in generating sheer volume of data to enable smart city services. The goal of smart city services is to create value for the entire ecosystem, whether this is health, education, transportation, energy, …


Online Aircraft System Identification Using A Novel Parameter Informed Reinforcement Learning Method, Nathan Schaff Oct 2023

Online Aircraft System Identification Using A Novel Parameter Informed Reinforcement Learning Method, Nathan Schaff

Doctoral Dissertations and Master's Theses

This thesis presents the development and analysis of a novel method for training reinforcement learning neural networks for online aircraft system identification of multiple similar linear systems, such as all fixed wing aircraft. This approach, termed Parameter Informed Reinforcement Learning (PIRL), dictates that reinforcement learning neural networks should be trained using input and output trajectory/history data as is convention; however, the PIRL method also includes any known and relevant aircraft parameters, such as airspeed, altitude, center of gravity location and/or others. Through this, the PIRL Agent is better suited to identify novel/test-set aircraft.

First, the PIRL method is applied to …


Quantifying Balance: Computational And Learning Frameworks For The Characterization Of Balance In Bipedal Systems, Kubra Akbas Aug 2023

Quantifying Balance: Computational And Learning Frameworks For The Characterization Of Balance In Bipedal Systems, Kubra Akbas

Dissertations

In clinical practice and general healthcare settings, the lack of reliable and objective balance and stability assessment metrics hinders the tracking of patient performance progression during rehabilitation; the assessment of bipedal balance plays a crucial role in understanding stability and falls in humans and other bipeds, while providing clinicians important information regarding rehabilitation outcomes. Bipedal balance has often been examined through kinematic or kinetic quantities, such as the Zero Moment Point and Center of Pressure; however, analyzing balance specifically through the body's Center of Mass (COM) state offers a holistic and easily comprehensible view of balance and stability.

Building upon …


Motion Synthesis And Control For Autonomous Agents Using Generative Models And Reinforcement Learning, Pei Xu Aug 2023

Motion Synthesis And Control For Autonomous Agents Using Generative Models And Reinforcement Learning, Pei Xu

All Dissertations

Imitating and predicting human motions have wide applications in both graphics and robotics, from developing realistic models of human movement and behavior in immersive virtual worlds and games to improving autonomous navigation for service agents deployed in the real world. Traditional approaches for motion imitation and prediction typically rely on pre-defined rules to model agent behaviors or use reinforcement learning with manually designed reward functions. Despite impressive results, such approaches cannot effectively capture the diversity of motor behaviors and the decision making capabilities of human beings. Furthermore, manually designing a model or reward function to explicitly describe human motion characteristics …


Imitating Opponent To Win: Adversarial Policy Imitation Learning In Two-Player Competitive Games, The Viet Bui, Tien Mai, Thanh H. Nguyen Jun 2023

Imitating Opponent To Win: Adversarial Policy Imitation Learning In Two-Player Competitive Games, The Viet Bui, Tien Mai, Thanh H. Nguyen

Research Collection School Of Computing and Information Systems

Recent research on vulnerabilities of deep reinforcement learning (RL) has shown that adversarial policies adopted by an adversary agent can influence a target RL agent (victim agent) to perform poorly in a multi-agent environment. In existing studies, adversarial policies are directly trained based on experiences of interacting with the victim agent. There is a key shortcoming of this approach --- knowledge derived from historical interactions may not be properly generalized to unexplored policy regions of the victim agent, making the trained adversarial policy significantly less effective. In this work, we design a new effective adversarial policy learning algorithm that overcomes …


Gaslight: Attacking Hard-Label Black-Box Classifiers Via Deep Reinforcement Learning, Rajat Sethi May 2023

Gaslight: Attacking Hard-Label Black-Box Classifiers Via Deep Reinforcement Learning, Rajat Sethi

All Theses

Through artificial intelligence, algorithms can classify arrays of data, such as images or videos, into a predefined set of categories. With enough labeled data, a classifier can analyze an input’s components and calculate confidence scores for each category. However, machine learning relies heavily on approximation, which allows attackers to exploit classifiers by providing adversarial
examples. Specifically, attackers can modify their input so that the victim classifier cannot correctly label it, while a human observer would be unable to notice the difference.
This thesis proposes Gaslight, a system that uses deep reinforcement learning to generate adversarial examples against a victim classifier. …


Rigorous Experimentation For Reinforcement Learning, Scott M. Jordan Apr 2023

Rigorous Experimentation For Reinforcement Learning, Scott M. Jordan

Doctoral Dissertations

Scientific fields make advancements by leveraging the knowledge created by others to push the boundary of understanding. The primary tool in many fields for generating knowledge is empirical experimentation. Although common, generating accurate knowledge from empirical experiments is often challenging due to inherent randomness in execution and confounding variables that can obscure the correct interpretation of the results. As such, researchers must hold themselves and others to a high degree of rigor when designing experiments. Unfortunately, most reinforcement learning (RL) experiments lack this rigor, making the knowledge generated from experiments dubious. This dissertation proposes methods to address central issues in …


Peer-To-Peer Energy Trading In Smart Residential Environment With User Behavioral Modeling, Ashutosh Timilsina Jan 2023

Peer-To-Peer Energy Trading In Smart Residential Environment With User Behavioral Modeling, Ashutosh Timilsina

Theses and Dissertations--Computer Science

Electric power systems are transforming from a centralized unidirectional market to a decentralized open market. With this shift, the end-users have the possibility to actively participate in local energy exchanges, with or without the involvement of the main grid. Rapidly reducing prices for Renewable Energy Technologies (RETs), supported by their ease of installation and operation, with the facilitation of Electric Vehicles (EV) and Smart Grid (SG) technologies to make bidirectional flow of energy possible, has contributed to this changing landscape in the distribution side of the traditional power grid.

Trading energy among users in a decentralized fashion has been referred …


Practical Ai Value Alignment Using Stories, Md Sultan Al Nahian Jan 2023

Practical Ai Value Alignment Using Stories, Md Sultan Al Nahian

Theses and Dissertations--Computer Science

As more machine learning agents interact with humans, it is increasingly a prospect that an agent trained to perform a task optimally - using only a measure of task performance as feedback--can violate societal norms for acceptable behavior or cause harm. Consequently, it becomes necessary to prioritize task performance and ensure that AI actions do not have detrimental effects. Value alignment is a property of intelligent agents, wherein they solely pursue goals and activities that are non-harmful and beneficial to humans. Current approaches to value alignment largely depend on imitation learning or learning from demonstration methods. However, the dynamic nature …


Airport Assignment For Emergency Aircraft Using Reinforcement Learning, Saketh Kamatham Jan 2023

Airport Assignment For Emergency Aircraft Using Reinforcement Learning, Saketh Kamatham

Master's Projects

The volume of air traffic is increasing exponentially every day. The Air Traffic Control (ATC) at the airport has to handle aircraft runway assignments for landing and takeoff and airspace maintenance by directing passing aircraft through the airspace safely. If any aircraft is facing a technical issue or problem and is in a state of emergency, it requires expedited landing to respond to that emergency. The ATC gives this aircraft priority to landing and assistance. This process is very strenuous as the ATC has to deal with multiple aspects along with the emergency aircraft. It is the duty of the …


Algorithmic Improvements In Deep Reinforcement Learning, Norman L. Tasfi Dec 2022

Algorithmic Improvements In Deep Reinforcement Learning, Norman L. Tasfi

Electronic Thesis and Dissertation Repository

Reinforcement Learning (RL) has seen exponential performance improvements over the past decade, achieving super-human performance across many domains. Deep Reinforcement Learning (DRL), the combination of RL methods with deep neural networks (DNN) as function approximators, has unlocked much of this progress. The path to generalized artificial intelligence (GAI) will depend on deep learning (DL) and RL. However, much work is required before the technology reaches anything resembling GAI. Therefore, this thesis focuses on a subset of areas within RL that require additional research to advance the field, specifically: sample efficiency, planning, and task transfer. The first area, sample efficiency, refers …


Reinforcement Learning Approach To Coordinate Real-World Multi-Agent Dynamic Routing And Scheduling, Joe Waldy Nov 2022

Reinforcement Learning Approach To Coordinate Real-World Multi-Agent Dynamic Routing And Scheduling, Joe Waldy

Dissertations and Theses Collection (Open Access)

In this thesis, we study new variants of routing and scheduling problems motivated by real-world problems from the urban logistics and law enforcement domains. In particular, we focus on two key aspects: dynamic and multi-agent. While routing problems such as the Vehicle Routing Problem (VRP) is well-studied in the Operations Research (OR) community, we know that in real-world route planning today, initially-planned route plans and schedules may be disrupted by dynamically-occurring events. In addition, routing and scheduling plans cannot be done in silos due to the presence of other agents which may be independent and self-interested. These requirements create …


Adaptive Multi-Scale Place Cell Representations And Replay For Spatial Navigation And Learning In Autonomous Robots, Pablo Scleidorovich Oct 2022

Adaptive Multi-Scale Place Cell Representations And Replay For Spatial Navigation And Learning In Autonomous Robots, Pablo Scleidorovich

USF Tampa Graduate Theses and Dissertations

Place cells are one of the most widely studied neurons thought to play a vital role in spatial cognition. Extensive studies show that their activity in the rodent hippocampus is highly correlated with the animal’s spatial location, forming “place fields” of smaller sizes near the dorsal pole and larger sizes near the ventral pole. Despite advances, it is yet unclear how this multi-scale representation enables navigation in complex environments.

In this dissertation, we analyze the place cell representation from a computational point of view, evaluating how multi-scale place fields impact navigation in large and cluttered environments. The objectives are to …


Symplectically Integrated Symbolic Regression Of Hamiltonian Dynamical Systems, Daniel Dipietro Jun 2022

Symplectically Integrated Symbolic Regression Of Hamiltonian Dynamical Systems, Daniel Dipietro

Computer Science Senior Theses

Here we present Symplectically Integrated Symbolic Regression (SISR), a novel technique for learning physical governing equations from data. SISR employs a deep symbolic regression approach, using a multi-layer LSTMRNN with mutation to probabilistically sample Hamiltonian symbolic expressions. Using symplectic neural networks, we develop a model-agnostic approach for extracting meaningful physical priors from the data that can be imposed on-the-fly into the RNN output, limiting its search space. Hamiltonians generated by the RNN are optimized and assessed using a fourth-order symplectic integration scheme; prediction performance is used to train the LSTM-RNN to generate increasingly better functions via a risk-seeking policy gradients …


Reinforcement Learning Approach To Solve Dynamic Bi-Objective Police Patrol Dispatching And Rescheduling Problem, Waldy Joe, Hoong Chuin Lau, Jonathan Pan Jun 2022

Reinforcement Learning Approach To Solve Dynamic Bi-Objective Police Patrol Dispatching And Rescheduling Problem, Waldy Joe, Hoong Chuin Lau, Jonathan Pan

Research Collection School Of Computing and Information Systems

Police patrol aims to fulfill two main objectives namely to project presence and to respond to incidents in a timely manner. Incidents happen dynamically and can disrupt the initially-planned patrol schedules. The key decisions to be made will be which patrol agent to be dispatched to respond to an incident and subsequently how to adapt the patrol schedules in response to such dynamically-occurring incidents whilst still fulfilling both objectives; which sometimes can be conflicting. In this paper, we define this real-world problem as a Dynamic Bi-Objective Police Patrol Dispatching and Rescheduling Problem and propose a solution approach that combines Deep …


Hierarchical Value Decomposition For Effective On-Demand Ride Pooling, Hao Jiang, Pradeep Varakantham May 2022

Hierarchical Value Decomposition For Effective On-Demand Ride Pooling, Hao Jiang, Pradeep Varakantham

Research Collection School Of Computing and Information Systems

On-demand ride-pooling (e.g., UberPool, GrabShare) services focus on serving multiple different customer requests using each vehicle, i.e., an empty or partially filled vehicle can be assigned requests from different passengers with different origins and destinations. On the other hand, in Taxi on Demand (ToD) services (e.g., UberX), one vehicle is assigned to only one request at a time. On-demand ride pooling is not only beneficial to customers (lower cost), drivers (higher revenue per trip) and aggregation companies (higher revenue), but is also of crucial importance to the environment as it reduces the number of vehicles required on the roads. Since …


Decision-Analytic Models Using Reinforcement Learning To Inform Dynamic Sequential Decisions In Public Policy, Seyedeh Nazanin Khatami Mar 2022

Decision-Analytic Models Using Reinforcement Learning To Inform Dynamic Sequential Decisions In Public Policy, Seyedeh Nazanin Khatami

Doctoral Dissertations

We developed decision-analytic models specifically suited for long-term sequential decision-making in the context of large-scale dynamic stochastic systems, focusing on public policy investment decisions. We found that while machine learning and artificial intelligence algorithms provide the most suitable frameworks for such analyses, multiple challenges arise in its successful adaptation. We address three specific challenges in two public sectors, public health and climate policy, through the following three essays. In Essay I, we developed a reinforcement learning (RL) model to identify optimal sequence of testing and retention-in-care interventions to inform the national strategic plan “Ending the HIV Epidemic in the US”. …


Analyzing Decision-Making In Robot Soccer For Attacking Behaviors, Justin Rodney Mar 2022

Analyzing Decision-Making In Robot Soccer For Attacking Behaviors, Justin Rodney

USF Tampa Graduate Theses and Dissertations

In robotics soccer, decision-making is critical to the performance of a team’s SoftwareSystem. The University of South Florida’s (USF) RoboBulls team implements behavior for the robots by using traditional methods such as analytical geometry to path plan and determine whether an action should be taken. In recent works, Machine Learning (ML) and Reinforcement Learning (RL) techniques have been used to calculate the probability of success for a pass or goal, and even train models for performing low-level skills such as traveling towards a ball and shooting it towards the goal[1, 2]. Open-source frameworks have been created for training Reinforcement Learning …


Iseeq: Information Seeking Question Generation Using Dynamic Meta-Information Retrieval And Knowledge Graphs, Manas Gaur, Kalpa Gunaratna, Vijay Srinivasan, Hongxia Jin Feb 2022

Iseeq: Information Seeking Question Generation Using Dynamic Meta-Information Retrieval And Knowledge Graphs, Manas Gaur, Kalpa Gunaratna, Vijay Srinivasan, Hongxia Jin

Publications

Conversational Information Seeking (CIS) is a relatively new research area within conversational AI that attempts to seek information from end-users in order to understand and satisfy users’ needs. If realized, such a system has far-reaching benefits in the real world; for example, a CIS system can assist clinicians in pre-screening or triaging patients in healthcare. A key open sub-problem in CIS that remains unaddressed in the literature is generating Information Seeking Questions (ISQs) based on a short initial query from the end user. To address this open problem, we propose Information SEEking Question generator (ISEEQ), a novel approach for generating …


Reinforcement Learning: Low Discrepancy Action Selection For Continuous States And Actions, Jedidiah Lindborg Jan 2022

Reinforcement Learning: Low Discrepancy Action Selection For Continuous States And Actions, Jedidiah Lindborg

Electronic Theses and Dissertations

In reinforcement learning the process of selecting an action during the exploration or exploitation stage is difficult to optimize. The purpose of this thesis is to create an action selection process for an agent by employing a low discrepancy action selection (LDAS) method. This should allow the agent to quickly determine the utility of its actions by prioritizing actions that are dissimilar to ones that it has already picked. In this way the learning process should be faster for the agent and result in more optimal policies.


Whole File Chunk Based Deduplication Using Reinforcement Learning, Xincheng Yuan Jan 2022

Whole File Chunk Based Deduplication Using Reinforcement Learning, Xincheng Yuan

Master's Projects

Deduplication is the process of removing replicated data content from storage facilities like online databases, cloud datastore, local file systems, etc., which is commonly performed as part of data preprocessing to eliminate redundant data that requires unnecessary storage spaces and computing power. Deduplication is even more specifically essential for file backup systems since duplicated files will presumably consume more storage space, especially with a short backup period like daily [8]. A common technique in this field involves splitting files into chunks whose hashes can be compared using data structures or techniques like clustering. In this project we explore the possibility …


Burst-Induced Multi-Armed Bandit For Learning Recommendation, Rodrigo Alves, Antoine Ledent, Marius Kloft Oct 2021

Burst-Induced Multi-Armed Bandit For Learning Recommendation, Rodrigo Alves, Antoine Ledent, Marius Kloft

Research Collection School Of Computing and Information Systems

In this paper, we introduce a non-stationary and context-free Multi-Armed Bandit (MAB) problem and a novel algorithm (which we refer to as BMAB) to solve it. The problem is context-free in the sense that no side information about users or items is needed. We work in a continuous-time setting where each timestamp corresponds to a visit by a user and a corresponding decision regarding recommendation. The main novelty is that we model the reward distribution as a consequence of variations in the intensity of the activity, and thereby we assist the exploration/exploitation dilemma by exploring the temporal dynamics of the …


Reinforcement Learning Algorithms: An Overview And Classification, Fadi Almahamid, Katarina Grolinger Sep 2021

Reinforcement Learning Algorithms: An Overview And Classification, Fadi Almahamid, Katarina Grolinger

Electrical and Computer Engineering Publications

The desire to make applications and machines more intelligent and the aspiration to enable their operation without human interaction have been driving innovations in neural networks, deep learning, and other machine learning techniques. Although reinforcement learning has been primarily used in video games, recent advancements and the development of diverse and powerful reinforcement algorithms have enabled the reinforcement learning community to move from playing video games to solving complex real-life problems in autonomous systems such as self-driving cars, delivery drones, and automated robotics. Understanding the environment of an application and the algorithms’ limitations plays a vital role in selecting the …


Toward Deep Supervised Anomaly Detection: Reinforcement Learning From Partially Labeled Anomaly Data, Guansong Pang, Anton Van Den Hengel, Chunhua Shen, Longbing Cao Aug 2021

Toward Deep Supervised Anomaly Detection: Reinforcement Learning From Partially Labeled Anomaly Data, Guansong Pang, Anton Van Den Hengel, Chunhua Shen, Longbing Cao

Research Collection School Of Computing and Information Systems

We consider the problem of anomaly detection with a small set of partially labeled anomaly examples and a large-scale unlabeled dataset. This is a common scenario in many important applications. Existing related methods either exclusively fit the limited anomaly examples that typically do not span the entire set of anomalies, or proceed with unsupervised learning from the unlabeled data. We propose here instead a deep reinforcement learning-based approach that enables an end-to-end optimization of the detection of both labeled and unlabeled anomalies. This approach learns the known abnormality by automatically interacting with an anomalybiased simulation environment, while continuously extending the …


Optimization And Machine Learning Methods For Solving Combinatorial Problems In Urban Transportation, Aigerim Bogyrbayeva Jun 2021

Optimization And Machine Learning Methods For Solving Combinatorial Problems In Urban Transportation, Aigerim Bogyrbayeva

USF Tampa Graduate Theses and Dissertations

This dissertation investigates three applications of emerging technologies for urban trans- portation. In the first chapter, we design a new market for fractional ownership of au- tonomous vehicles (AVs), in which an AV is co-leased by a group of individuals. We present a practical iterative auction based on the combinatorial clock auction to match the interested customers together and determine their payments. In designing such an auction, we con- sider continuous-time items (time slots) which are defined by bidders, and naturally exploit driverless mobility of AVs to form co-leasing groups. To relieve the computational burdens of both bidders and the …


Generating Effective Sentence Representations: Deep Learning And Reinforcement Learning Approaches, Mahtab Ahmed Apr 2021

Generating Effective Sentence Representations: Deep Learning And Reinforcement Learning Approaches, Mahtab Ahmed

Electronic Thesis and Dissertation Repository

Natural language processing (NLP) is one of the most important technologies of the information age. Understanding complex language utterances is also a crucial part of artificial intelligence. Many Natural Language applications are powered by machine learning models performing a large variety of underlying tasks. Recently, deep learning approaches have obtained very high performance across many NLP tasks. In order to achieve this high level of performance, it is crucial for computers to have an appropriate representation of sentences. The tasks addressed in the thesis are best approached having shallow semantic representations. These representations are vectors that are then embedded in …


Deep Learning For Multi-Tissue Cancer Classification Of Gene Expressions, Tarek Khorshed Jan 2021

Deep Learning For Multi-Tissue Cancer Classification Of Gene Expressions, Tarek Khorshed

Theses and Dissertations

We contribute in saving the lives of cancer patients through early detection and diagnosis, since one of the major challenges in cancer treatment is that patients are diagnosed at very late stages when appropriate medical interventions become less effective and full curative treatment is no longer achievable. Cancer classification using gene expressions is extremely challenging given the complexity and high dimensionality of the data. Current classification methods typically rely on samples collected from a single tissue type and perform a prerequisite of gene feature selection to avoid processing the full set of genes. These methods fall short in taking advantage …


Scheduling Allocation And Inventory Replenishment Problems Under Uncertainty: Applications In Managing Electric Vehicle And Drone Battery Swap Stations, Amin Asadi Jan 2021

Scheduling Allocation And Inventory Replenishment Problems Under Uncertainty: Applications In Managing Electric Vehicle And Drone Battery Swap Stations, Amin Asadi

Graduate Theses and Dissertations

In this dissertation, motivated by electric vehicle (EV) and drone application growth, we propose novel optimization problems and solution techniques for managing the operations at EV and drone battery swap stations. In Chapter 2, we introduce a novel class of stochastic scheduling allocation and inventory replenishment problems (SAIRP), which determines the recharging, discharging, and replacement decisions at a swap station over time to maximize the expected total profit. We use Markov Decision Process (MDP) to model SAIRPs facing uncertain demands, varying costs, and battery degradation. Considering battery degradation is crucial as it relaxes the assumption that charging/discharging batteries do not …


Arrangements Of The Inputs And Outputs In The Multi-Robot Continuous Control Problem, Sida Liu Jan 2021

Arrangements Of The Inputs And Outputs In The Multi-Robot Continuous Control Problem, Sida Liu

Graduate College Dissertations and Theses

The Multi-Robot Continuous Control (MRCC) problem in Deep Reinforcement Learning requires a single neural controller (agent) to learn to control the behavior of multiple robot bodies. When learning to control a single robot body, sensors and motors are arbitrarily connected to the input and output layers of the neural controller, respectively, and this arrangement does not affect the learnability of target robot behaviors. If and how such arrangement can affect learnability in MRCC---when dealing with multiple robots with different body plans---is as of yet unknown.

In this thesis, I demonstrate the following: (1) A neural controller can control a small …