Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Data Science

William & Mary

Undergraduate Honors Theses

2024

Articles 1 - 3 of 3

Full-Text Articles in Physical Sciences and Mathematics

Security And Interpretability In Large Language Models, Lydia Danas May 2024

Security And Interpretability In Large Language Models, Lydia Danas

Undergraduate Honors Theses

Large Language Models (LLMs) have the capability to model long-term dependencies in sequences of tokens, and are consequently often utilized to generate text through language modeling. These capabilities are increasingly being used for code generation tasks; however, LLM-powered code generation tools such as GitHub's Copilot have been generating insecure code and thus pose a cybersecurity risk. To generate secure code we must first understand why LLMs are generating insecure code. This non-trivial task can be realized through interpretability methods, which investigate the hidden state of a neural network to explain model outputs. A new interpretability method is rationales, which obtains …


Roads And Corresponding Travel Time To Markets: Assessing Climate Vulnerability In Nepal, Kaitlyn Crowley May 2024

Roads And Corresponding Travel Time To Markets: Assessing Climate Vulnerability In Nepal, Kaitlyn Crowley

Undergraduate Honors Theses

Roads exist as a physical and theoretical connection between people and places around the globe. In addition to providing a route from one point to another, roads are also an indicator of access to markets and of poverty. However, current road datasets, particularly the Global Roads Open Access Data Set, are out of date or incomplete, necessitating new sources of data for analyses involving road networks. This study explores the relationship between climate change and access to markets in Nepal. We seek to identify isolated communities that are likely to experience detrimental outcomes associated with environmental threats, such as increasing …


Code Syntax Understanding In Large Language Models, Cole Granger May 2024

Code Syntax Understanding In Large Language Models, Cole Granger

Undergraduate Honors Theses

In recent years, tasks for automated software engineering have been achieved using Large Language Models trained on source code, such as Seq2Seq, LSTM, GPT, T5, BART and BERT. The inherent textual nature of source code allows it to be represented as a sequence of sub-words (or tokens), drawing parallels to prior work in NLP. Although these models have shown promising results according to established metrics (e.g., BLEU, CODEBLEU), there remains a deeper question about the extent of syntax knowledge they truly grasp when trained and fine-tuned for specific tasks.

To address this question, this thesis introduces a taxonomy of syntax …