Open Access. Powered by Scholars. Published by Universities.®
Articles 1 - 1 of 1
Full-Text Articles in Entire DC Network
Mitigating Safety Issues In Pre-Trained Language Models: A Model-Centric Approach Leveraging Interpretation Methods, Weicheng Ma
Mitigating Safety Issues In Pre-Trained Language Models: A Model-Centric Approach Leveraging Interpretation Methods, Weicheng Ma
Dartmouth College Ph.D Dissertations
Pre-trained language models (PLMs), like GPT-4, which powers ChatGPT, face various safety issues, including biased responses and a lack of alignment with users' backgrounds and expectations. These problems threaten their sociability and public application. Present strategies for addressing these safety concerns primarily involve data-driven approaches, requiring extensive human effort in data annotation and substantial training resources. Research indicates that the nature of these safety issues evolves over time, necessitating continual updates to data and model re-training—an approach that is both resource-intensive and time-consuming. This thesis introduces a novel, model-centric strategy for understanding and mitigating the safety issues of PLMs by …