Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

PDF

Machine Learning Faculty Publications

Maximum likelihood estimation

Articles 1 - 1 of 1

Full-Text Articles in Physical Sciences and Mathematics

Efficient (Soft) Q-Learning For Text Generation With Limited Good Data, Han Guo, Bowen Tan, Zhengzhong Liu, Eric P. Xing, Zhiting Hu Dec 2022

Efficient (Soft) Q-Learning For Text Generation With Limited Good Data, Han Guo, Bowen Tan, Zhengzhong Liu, Eric P. Xing, Zhiting Hu

Machine Learning Faculty Publications

Maximum likelihood estimation (MLE) is the predominant algorithm for training text generation models. This paradigm relies on direct supervision examples, which is not applicable to many emerging applications, such as generating adversarial attacks or generating prompts to control language models. Reinforcement learning (RL) on the other hand offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward. Yet previous RL algorithms for text generation, such as policy gradient (on-policy RL) and Q-learning (off-policy RL), are often notoriously inefficient or unstable to train due to the large sequence space and the sparse reward received only …