Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

Dartmouth College

Theses/Dissertations

2024

Natural Language Processing

Articles 1 - 1 of 1

Full-Text Articles in Entire DC Network

Mitigating Safety Issues In Pre-Trained Language Models: A Model-Centric Approach Leveraging Interpretation Methods, Weicheng Ma Jan 2024

Mitigating Safety Issues In Pre-Trained Language Models: A Model-Centric Approach Leveraging Interpretation Methods, Weicheng Ma

Dartmouth College Ph.D Dissertations

Pre-trained language models (PLMs), like GPT-4, which powers ChatGPT, face various safety issues, including biased responses and a lack of alignment with users' backgrounds and expectations. These problems threaten their sociability and public application. Present strategies for addressing these safety concerns primarily involve data-driven approaches, requiring extensive human effort in data annotation and substantial training resources. Research indicates that the nature of these safety issues evolves over time, necessitating continual updates to data and model re-training—an approach that is both resource-intensive and time-consuming. This thesis introduces a novel, model-centric strategy for understanding and mitigating the safety issues of PLMs by …