Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 3 of 3

Full-Text Articles in Entire DC Network

Improving Automatic Transcription Using Natural Language Processing, Anna Kiefer Mar 2024

Improving Automatic Transcription Using Natural Language Processing, Anna Kiefer

Master's Theses

Digital Democracy is a CalMatters and California Polytechnic State University initia-
tive to promote transparency in state government by increasing access to the Califor-
nia legislature. While Digital Democracy is made up of many resources, one founda-
tional step of the project is obtaining accurate, timely transcripts of California Senate
and Assembly hearings. The information extracted from these transcripts provides
crucial data for subsequent steps in the pipeline. In the context of Digital Democracy,
upleveling is when humans verify, correct, and annotate the transcript results after
the legislative hearings have been automatically transcribed. The upleveling process
is done with the …


Mitigating Safety Issues In Pre-Trained Language Models: A Model-Centric Approach Leveraging Interpretation Methods, Weicheng Ma Jan 2024

Mitigating Safety Issues In Pre-Trained Language Models: A Model-Centric Approach Leveraging Interpretation Methods, Weicheng Ma

Dartmouth College Ph.D Dissertations

Pre-trained language models (PLMs), like GPT-4, which powers ChatGPT, face various safety issues, including biased responses and a lack of alignment with users' backgrounds and expectations. These problems threaten their sociability and public application. Present strategies for addressing these safety concerns primarily involve data-driven approaches, requiring extensive human effort in data annotation and substantial training resources. Research indicates that the nature of these safety issues evolves over time, necessitating continual updates to data and model re-training—an approach that is both resource-intensive and time-consuming. This thesis introduces a novel, model-centric strategy for understanding and mitigating the safety issues of PLMs by …


Language Models For Rare Disease Information Extraction: Empirical Insights And Model Comparisons, Shashank Gupta Jan 2024

Language Models For Rare Disease Information Extraction: Empirical Insights And Model Comparisons, Shashank Gupta

Theses and Dissertations--Computer Science

End-to-end relation extraction (E2ERE) is a crucial task in natural language processing (NLP) that involves identifying and classifying semantic relationships between entities in text. This thesis compares three paradigms for end-to-end relation extraction (E2ERE) in biomedicine, focusing on rare diseases with discontinuous and nested entities. We evaluate Named Entity Recognition (NER) to Relation Extraction (RE) pipelines, sequence-to-sequence models, and generative pre-trained transformer (GPT) models using the RareDis information extraction dataset. Our findings indicate that pipeline models are the most effective, followed closely by sequence-to-sequence models. GPT models, despite having eight times as many parameters, perform worse than sequence-to-sequence models and …