Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
Articles 1 - 22 of 22
Full-Text Articles in Physical Sciences and Mathematics
Multi-View Hypergraph Contrastive Policy Learning For Conversational Recommendation, Sen Zhao, Wei Wei, Xian-Ling Mao, Shuai: Yang Zhu, Zujie Wen, Dangyang Chen, Feida Zhu, Feida Zhu
Multi-View Hypergraph Contrastive Policy Learning For Conversational Recommendation, Sen Zhao, Wei Wei, Xian-Ling Mao, Shuai: Yang Zhu, Zujie Wen, Dangyang Chen, Feida Zhu, Feida Zhu
Research Collection School Of Computing and Information Systems
Conversational recommendation systems (CRS) aim to interactively acquire user preferences and accordingly recommend items to users. Accurately learning the dynamic user preferences is of crucial importance for CRS. Previous works learn the user preferences with pairwise relations from the interactive conversation and item knowledge, while largely ignoring the fact that factors for a relationship in CRS are multiplex. Specifically, the user likes/dislikes the items that satisfy some attributes (Like/Dislike view). Moreover social influence is another important factor that affects user preference towards the item (Social view), while is largely ignored by previous works in CRS. The user preferences from these …
Reinforced Adaptation Network For Partial Domain Adaptation, Keyu Wu, Min Wu, Zhenghua Chen, Ruibing Jin, Wei Cui, Zhiguang Cao, Xiaoli Li
Reinforced Adaptation Network For Partial Domain Adaptation, Keyu Wu, Min Wu, Zhenghua Chen, Ruibing Jin, Wei Cui, Zhiguang Cao, Xiaoli Li
Research Collection School Of Computing and Information Systems
Domain adaptation enables generalized learning in new environments by transferring knowledge from label-rich source domains to label-scarce target domains. As a more realistic extension, partial domain adaptation (PDA) relaxes the assumption of fully shared label space, and instead deals with the scenario where the target label space is a subset of the source label space. In this paper, we propose a Reinforced Adaptation Network (RAN) to address the challenging PDA problem. Specifically, a deep reinforcement learning model is proposed to learn source data selection policies. Meanwhile, a domain adaptation model is presented to simultaneously determine rewards and learn domain-invariant feature …
End-To-End Hierarchical Reinforcement Learning With Integrated Subgoal Discovery, Shubham Pateria, Budhitama Subagdja, Ah-Hwee Tan, Chai Quek
End-To-End Hierarchical Reinforcement Learning With Integrated Subgoal Discovery, Shubham Pateria, Budhitama Subagdja, Ah-Hwee Tan, Chai Quek
Research Collection School Of Computing and Information Systems
Hierarchical reinforcement learning (HRL) is a promising approach to perform long-horizon goal-reaching tasks by decomposing the goals into subgoals. In a holistic HRL paradigm, an agent must autonomously discover such subgoals and also learn a hierarchy of policies that uses them to reach the goals. Recently introduced end-to-end HRL methods accomplish this by using the higher-level policy in the hierarchy to directly search the useful subgoals in a continuous subgoal space. However, learning such a policy may be challenging when the subgoal space is large. We propose integrated discovery of salient subgoals (LIDOSS), an end-to-end HRL method with an integrated …
Approximate Difference Rewards For Scalable Multigent Reinforcement Learning, Arambam James Singh, Akshat Kumar
Approximate Difference Rewards For Scalable Multigent Reinforcement Learning, Arambam James Singh, Akshat Kumar
Research Collection School Of Computing and Information Systems
We address the problem of multiagent credit assignment in a large scale multiagent system. Difference rewards (DRs) are an effective tool to tackle this problem, but their exact computation is known to be challenging even for small number of agents. We propose a scalable method to compute difference rewards based on aggregate information in a multiagent system with large number of agents by exploiting the symmetry present in several practical applications. Empirical evaluation on two multiagent domains—air-traffic control and cooperative navigation, shows better solution quality than previous approaches.
Learning Index Policies For Restless Bandits With Application To Maternal Healthcare, Arpita Biswas, Gaurav Aggarwal, Pradeep Varakantham, Milind Tambe
Learning Index Policies For Restless Bandits With Application To Maternal Healthcare, Arpita Biswas, Gaurav Aggarwal, Pradeep Varakantham, Milind Tambe
Research Collection School Of Computing and Information Systems
In many community health settings, it is crucial to have a systematic monitoring and intervention process to ensure that the patients adhere to healthcare programs, such as periodic health checks or taking medications. When these interventions are expensive, they can be provided to only a fixed small fraction of the patients at any period of time. Hence, it is important to carefully choose the beneficiaries who should be provided with interventions and when. We model this scenario as a restless multi-armed bandit (RMAB) problem, where each beneficiary is assumed to transition from one state to another depending on the intervention …
Using Reinforcement Learning To Minimize The Probability Of Delay Occurrence In Transportation, Zhiguang Cao, Hongliang Guo, Wen Song, Kaizhou Gao, Zhengghua Chen, Le Zhang, Xuexi Zhang
Using Reinforcement Learning To Minimize The Probability Of Delay Occurrence In Transportation, Zhiguang Cao, Hongliang Guo, Wen Song, Kaizhou Gao, Zhengghua Chen, Le Zhang, Xuexi Zhang
Research Collection School Of Computing and Information Systems
Reducing traffic delay is of crucial importance for the development of sustainable transportation systems, which is a challenging task in the studies of stochastic shortest path (SSP) problem. Existing methods based on the probability tail model to solve the SSP problem, seek for the path that minimizes the probability of delay occurrence, which is equal to maximizing the probability of reaching the destination before a deadline (i.e., arriving on time). However, they suffer from low accuracy or high computational cost. Therefore, we design a novel and practical Q-learning approach where the converged Q-values have the practical meaning as the actual …
Multi-Agent Collaborative Exploration Through Graph-Based Deep Reinforcement Learning, Tianze Luo, Budhitama Subagdja, Ah-Hwee Tan, Ah-Hwee Tan
Multi-Agent Collaborative Exploration Through Graph-Based Deep Reinforcement Learning, Tianze Luo, Budhitama Subagdja, Ah-Hwee Tan, Ah-Hwee Tan
Research Collection School Of Computing and Information Systems
Autonomous exploration by a single or multiple agents in an unknown environment leads to various applications in automation, such as cleaning, search and rescue, etc. Traditional methods normally take frontier locations and segmented regions of the environment into account to efficiently allocate target locations to different agents to visit. They may employ ad hoc solutions to allocate the task to the agents, but the allocation may not be efficient. In the literature, few studies focused on enhancing the traditional methods by applying machine learning models for agent performance improvement. In this paper, we propose a graph-based deep reinforcement learning approach …
Probabilistic Guided Exploration For Reinforcement Learning In Self-Organizing Neural Networks, Peng Wang, Weigui Jair Zhou, Di Wang, Ah-Hwee Tan
Probabilistic Guided Exploration For Reinforcement Learning In Self-Organizing Neural Networks, Peng Wang, Weigui Jair Zhou, Di Wang, Ah-Hwee Tan
Research Collection School Of Computing and Information Systems
Exploration is essential in reinforcement learning, which expands the search space of potential solutions to a given problem for performance evaluations. Specifically, carefully designed exploration strategy may help the agent learn faster by taking the advantage of what it has learned previously. However, many reinforcement learning mechanisms still adopt simple exploration strategies, which select actions in a pure random manner among all the feasible actions. In this paper, we propose novel mechanisms to improve the existing knowledgebased exploration strategy based on a probabilistic guided approach to select actions. We conduct extensive experiments in a Minefield navigation simulator and the results …
Adopt: Combining Parameter Tuning And Adaptive Operator Ordering For Solving A Class Of Orienteering Problems, Aldy Gunawan, Hoong Chuin Lau, Kun Lu
Adopt: Combining Parameter Tuning And Adaptive Operator Ordering For Solving A Class Of Orienteering Problems, Aldy Gunawan, Hoong Chuin Lau, Kun Lu
Research Collection School Of Computing and Information Systems
Two fundamental challenges in local search based metaheuristics are how to determine parameter configurations and design the underlying Local Search (LS) procedure. In this paper, we propose a framework in order to handle both challenges, called ADaptive OPeraTor Ordering (ADOPT). In this paper, The ADOPT framework is applied to two metaheuristics, namely Iterated Local Search (ILS) and a hybridization of Simulated Annealing and ILS (SAILS) for solving two variants of the Orienteering Problem: the Team Dependent Orienteering Problem (TDOP) and the Team Orienteering Problem with Time Windows (TOPTW). This framework consists of two main processes. The Design of Experiment (DOE) …
Modeling Trajectories With Recurrent Neural Networks, Hao Wu, Ziyang Chen, Weiwei Sun, Baihua Zheng, Wei Wang
Modeling Trajectories With Recurrent Neural Networks, Hao Wu, Ziyang Chen, Weiwei Sun, Baihua Zheng, Wei Wang
Research Collection School Of Computing and Information Systems
Modeling trajectory data is a building block for many smart-mobility initiatives. Existing approaches apply shallow models such as Markov chain and inverse reinforcement learning to model trajectories, which cannot capture the long-term dependencies. On the other hand, deep models such as Recurrent Neura lNetwork (RNN) have demonstrated their strength of modeling variable length sequences. However, directly adopting RNN to model trajectories is not appropriate because of the unique topological constraints faced by trajectories. Motivated by these findings, we design two RNN-based models which can make full advantage of the strength of RNN to capture variable length sequence and meanwhile to …
Towards Autonomous Behavior Learning Of Non-Player Characters In Games, Shu Feng, Ah-Hwee Tan
Towards Autonomous Behavior Learning Of Non-Player Characters In Games, Shu Feng, Ah-Hwee Tan
Research Collection School Of Computing and Information Systems
Non-Player-Characters (NPCs), as found in computer games, can be modelled as intelligent systems, which serve to improve the interactivity and playability of the games. Although reinforcement learning (RL) has been a promising approach to creating the behavior models of non-player characters (NPC), an initial stage of exploration and low performance is typically required. On the other hand, imitative learning (IL) is an effective approach to pre-building a NPC’s behavior model by observing the opponent’s actions, but learning by imitation limits the agent’s performance to that of its opponents. In view of their complementary strengths, this paper proposes a computational model …
Adaptive Duty Cycling In Sensor Networks With Energy Harvesting Using Continuous-Time Markov Chain And Fluid Models, Ronald Wai Hong Chan, Pengfei Zhang, Ido Nevat, Sai Ganesh Nagarajan, Alvin Cerdena Valera, Hwee Xian Tan
Adaptive Duty Cycling In Sensor Networks With Energy Harvesting Using Continuous-Time Markov Chain And Fluid Models, Ronald Wai Hong Chan, Pengfei Zhang, Ido Nevat, Sai Ganesh Nagarajan, Alvin Cerdena Valera, Hwee Xian Tan
Research Collection School Of Computing and Information Systems
The dynamic and unpredictable nature of energy harvesting sources available for wireless sensor networks, and the time variation in network statistics like packet transmission rates and link qualities, necessitate the use of adaptive duty cycling techniques. Such adaptive control allows sensor nodes to achieve long-run energy neutrality, where energy supply and demand are balanced in a dynamic environment such that the nodes function continuously. In this paper, we develop a new framework enabling an adaptive duty cycling scheme for sensor networks that takes into account the node battery level, ambient energy that can be harvested, and application-level QoS requirements. We …
A Comparative Study Between Motivated Learning And Reinforcement Learning, James T. Graham, Janusz A. Starzyk, Zhen Ni, Haibo He, T.-H. Teng, Ah-Hwee Tan
A Comparative Study Between Motivated Learning And Reinforcement Learning, James T. Graham, Janusz A. Starzyk, Zhen Ni, Haibo He, T.-H. Teng, Ah-Hwee Tan
Research Collection School Of Computing and Information Systems
This paper analyzes advanced reinforcement learning techniques and compares some of them to motivated learning. Motivated learning is briefly discussed indicating its relation to reinforcement learning. A black box scenario for comparative analysis of learning efficiency in autonomous agents is developed and described. This is used to analyze selected algorithms. Reported results demonstrate that in the selected category of problems, motivated learning outperformed all reinforcement learning algorithms we compared with.
Integrating Motivated Learning And K-Winner-Take-All To Coordinate Multi-Agent Reinforcement Learning, Teck-Hou Teng, Ah-Hwee Tan, Janusz Starzyk, Yuan-Sin Tan, Loo-Nin Teow
Integrating Motivated Learning And K-Winner-Take-All To Coordinate Multi-Agent Reinforcement Learning, Teck-Hou Teng, Ah-Hwee Tan, Janusz Starzyk, Yuan-Sin Tan, Loo-Nin Teow
Research Collection School Of Computing and Information Systems
This work addresses the coordination issue in distributed optimization problem (DOP) where multiple distinct and time-critical tasks are performed to satisfy a global objective function. The performance of these tasks has to be coordinated due to the sharing of consumable resources and the dependency on non-consumable resources. Knowing that it can be sub-optimal to predefine the performance of the tasks for large DOPs, the multi-agent reinforcement learning (MARL) framework is adopted wherein an agent is used to learn the performance of each distinct task using reinforcement learning. To coordinate MARL, we propose a novel coordination strategy integrating Motivated Learning (ML) …
Adaptive Computer‐Generated Forces For Simulator‐Based Training, Expert Systems With Applications, Teck-Hou Teng, Ah-Hwee Tan, Loo-Nin Teow
Adaptive Computer‐Generated Forces For Simulator‐Based Training, Expert Systems With Applications, Teck-Hou Teng, Ah-Hwee Tan, Loo-Nin Teow
Research Collection School Of Computing and Information Systems
Simulator-based training is in constant pursuit of increasing level of realism. The transition from doctrine-driven computer-generated forces (CGF) to adaptive CGF represents one such effort. The use of doctrine-driven CGF is fraught with challenges such as modeling of complex expert knowledge and adapting to the trainees’ progress in real time. Therefore, this paper reports on how the use of adaptive CGF can overcome these challenges. Using a self-organizing neural network to implement the adaptive CGF, air combat maneuvering strategies are learned incrementally and generalized in real time. The state space and action space are extracted from the same hierarchical doctrine …
Motivated Learning For The Development Of Autonomous Agents, Janusz A. Starzyk, James T. Graham, Pawel Raif, Ah-Hwee Tan
Motivated Learning For The Development Of Autonomous Agents, Janusz A. Starzyk, James T. Graham, Pawel Raif, Ah-Hwee Tan
Research Collection School Of Computing and Information Systems
A new machine learning approach known as motivated learning (ML) is presented in this work. Motivated learning drives a machine to develop abstract motivations and choose its own goals. ML also provides a self-organizing system that controls a machine’s behavior based on competition between dynamically-changing pain signals. This provides an interplay of externally driven and internally generated control signals. It is demonstrated that ML not only yields a more sophisticated learning mechanism and system of values than reinforcement learning (RL), but is also more efficient in learning complex relations and delivers better performance than RL in dynamically changing environments. In …
Self‐Regulating Action Exploration In Reinforcement Learning, Teck-Hou Teng, Ah-Hwee Tan, Yuan-Sin Tan
Self‐Regulating Action Exploration In Reinforcement Learning, Teck-Hou Teng, Ah-Hwee Tan, Yuan-Sin Tan
Research Collection School Of Computing and Information Systems
The basic tenet of a learning process is for an agent to learn for only as much and as long as it is necessary. With reinforcement learning, the learning process is divided between exploration and exploitation. Given the complexity of the problem domain and the randomness of the learning process, the exact duration of the reinforcement learning process can never be known with certainty. Using an inaccurate number of training iterations leads either to the non-convergence or the over-training of the learning agent. This work addresses such issues by proposing a technique to self-regulate the exploration rate and training duration …
Cooperative Reinforcement Learning In Topology-Based Multi-Agent Systems, Dan Xiao, Ah-Hwee Tan
Cooperative Reinforcement Learning In Topology-Based Multi-Agent Systems, Dan Xiao, Ah-Hwee Tan
Research Collection School Of Computing and Information Systems
Topology-based multi-agent systems (TMAS), wherein agents interact with one another according to their spatial relationship in a network, are well suited for problems with topological constraints. In a TMAS system, however, each agent may have a different state space, which can be rather large. Consequently, traditional approaches to multi-agent cooperative learning may not be able to scale up with the complexity of the network topology. In this paper, we propose a cooperative learning strategy, under which autonomous agents are assembled in a binary tree formation (BTF). By constraining the interaction between agents, we effectively unify the state space of individual …
A Hybrid Agent Architecture Integrating Desire, Intention And Reinforcement Learning, Ah-Hwee Tan, Yew-Soon Ong, Akejariyawong Tapanuj
A Hybrid Agent Architecture Integrating Desire, Intention And Reinforcement Learning, Ah-Hwee Tan, Yew-Soon Ong, Akejariyawong Tapanuj
Research Collection School Of Computing and Information Systems
This paper presents a hybrid agent architecture that integrates the behaviours of BDI agents, specifically desire and intention, with a neural network based reinforcement learner known as Temporal DifferenceFusion Architecture for Learning and COgNition (TD-FALCON). With the explicit maintenance of goals, the agent performs reinforcement learning with the awareness of its objectives instead of relying on external reinforcement signals. More importantly, the intention module equips the hybrid architecture with deliberative planning capabilities, enabling the agent to purposefully maintain an agenda of actions to perform and reducing the need of constantly sensing the environment. Through reinforcement learning, plans can also be …
A Self-Organizing Neural Architecture Integrating Desire, Intention And Reinforcement Learning, Ah-Hwee Tan, Yu-Hong Feng, Yew-Soon Ong
A Self-Organizing Neural Architecture Integrating Desire, Intention And Reinforcement Learning, Ah-Hwee Tan, Yu-Hong Feng, Yew-Soon Ong
Research Collection School Of Computing and Information Systems
This paper presents a self-organizing neural architecture that integrates the features of belief, desire, and intention (BDI) systems with reinforcement learning. Based on fusion Adaptive Resonance Theory (fusion ART), the proposed architecture provides a unified treatment for both intentional and reactive cognitive functionalities. Operating with a sense-act-learn paradigm, the low level reactive module is a fusion ART network that learns action and value policies across the sensory, motor, and feedback channels. During performance, the actions executed by the reactive module are tracked by a high level intention module (also a fusion ART network) that learns to associate sequences of actions …
Self-Organizing Neural Models Integrating Rules And Reinforcement Learning, Teck-Hou Teng, Zhong-Ming Tan, Ah-Hwee Tan
Self-Organizing Neural Models Integrating Rules And Reinforcement Learning, Teck-Hou Teng, Zhong-Ming Tan, Ah-Hwee Tan
Research Collection School Of Computing and Information Systems
Traditional approaches to integrating knowledge into neural network are concerned mainly about supervised learning. This paper presents how a family of self-organizing neural models known as fusion architecture for learning, cognition and navigation (FALCON) can incorporate a priori knowledge and perform knowledge refinement and expansion through reinforcement learning. Symbolic rules are formulated based on pre-existing know-how and inserted into FALCON as a priori knowledge. The availability of knowledge enables FALCON to start performing earlier in the initial learning trials. Through a temporal-difference (TD) learning method, the inserted rules can be refined and expanded according to the evaluative feedback signals received …
Integrating Temporal Difference Methods And Self‐Organizing Neural Networks For Reinforcement Learning With Delayed Evaluative Feedback, Ah-Hwee Tan, Ning Lu, Dan Xiao
Integrating Temporal Difference Methods And Self‐Organizing Neural Networks For Reinforcement Learning With Delayed Evaluative Feedback, Ah-Hwee Tan, Ning Lu, Dan Xiao
Research Collection School Of Computing and Information Systems
This paper presents a neural architecture for learning category nodes encoding mappings across multimodal patterns involving sensory inputs, actions, and rewards. By integrating adaptive resonance theory (ART) and temporal difference (TD) methods, the proposed neural model, called TD fusion architecture for learning, cognition, and navigation (TD-FALCON), enables an autonomous agent to adapt and function in a dynamic environment with immediate as well as delayed evaluative feedback (reinforcement) signals. TD-FALCON learns the value functions of the state-action space estimated through on-policy and off-policy TD learning methods, specifically state-action-reward-state-action (SARSA) and Q-learning. The learned value functions are then used to determine the …