Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Information Security

PDF

William & Mary

Undergraduate Honors Theses

Theses/Dissertations

Articles 1 - 1 of 1

Full-Text Articles in Physical Sciences and Mathematics

Security And Interpretability In Large Language Models, Lydia Danas May 2024

Security And Interpretability In Large Language Models, Lydia Danas

Undergraduate Honors Theses

Large Language Models (LLMs) have the capability to model long-term dependencies in sequences of tokens, and are consequently often utilized to generate text through language modeling. These capabilities are increasingly being used for code generation tasks; however, LLM-powered code generation tools such as GitHub's Copilot have been generating insecure code and thus pose a cybersecurity risk. To generate secure code we must first understand why LLMs are generating insecure code. This non-trivial task can be realized through interpretability methods, which investigate the hidden state of a neural network to explain model outputs. A new interpretability method is rationales, which obtains …