Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

William & Mary

Undergraduate Honors Theses

Publication Year

Articles 1 - 14 of 14

Full-Text Articles in Physical Sciences and Mathematics

Modeling The Neutral Densities Of Sparc Using A Python Version Of Kn1d, Gwendolyn R. Galleher May 2024

Modeling The Neutral Densities Of Sparc Using A Python Version Of Kn1d, Gwendolyn R. Galleher

Undergraduate Honors Theses

Currently, neutral recycling is a crucial contributor to fueling the plasma within tokamaks. However, Commonwealth Fusion System’s SPARC Tokamak is expected to be more opaque to neutrals. Thus, we anticipate that the role of neutral recycling in fueling will decrease. Since SPARC is predicted to have a groundbreaking fusion power gain ratio of Q ≈ 10, we must have a concrete understanding of the opacity
and whether or not alternative fueling practices must be included. To develop said understanding, we produced neutral density profiles via KN1DPy, a 1D kinetic neutral transport code for atomic and molecular hydrogen in an ionizing …


Security And Interpretability In Large Language Models, Lydia Danas May 2024

Security And Interpretability In Large Language Models, Lydia Danas

Undergraduate Honors Theses

Large Language Models (LLMs) have the capability to model long-term dependencies in sequences of tokens, and are consequently often utilized to generate text through language modeling. These capabilities are increasingly being used for code generation tasks; however, LLM-powered code generation tools such as GitHub's Copilot have been generating insecure code and thus pose a cybersecurity risk. To generate secure code we must first understand why LLMs are generating insecure code. This non-trivial task can be realized through interpretability methods, which investigate the hidden state of a neural network to explain model outputs. A new interpretability method is rationales, which obtains …


Improving The Scalability Of Neural Network Surface Code Decoders, Kevin Wu May 2024

Improving The Scalability Of Neural Network Surface Code Decoders, Kevin Wu

Undergraduate Honors Theses

Quantum computers have recently gained significant recognition due to their ability to solve problems intractable to classical computers. However, due to difficulties in building actual quantum computers, they have large error rates. Thus, advancements in quantum error correction are urgently needed to improve both their reliability and scalability. Here, we first present a type of topological quantum error correction code called the surface code, and we discuss recent developments and challenges of creating neural network decoders for surface codes. In particular, the amount of training data needed to reach the performance of algorithmic decoders grows exponentially with the size of …


Code Syntax Understanding In Large Language Models, Cole Granger May 2024

Code Syntax Understanding In Large Language Models, Cole Granger

Undergraduate Honors Theses

In recent years, tasks for automated software engineering have been achieved using Large Language Models trained on source code, such as Seq2Seq, LSTM, GPT, T5, BART and BERT. The inherent textual nature of source code allows it to be represented as a sequence of sub-words (or tokens), drawing parallels to prior work in NLP. Although these models have shown promising results according to established metrics (e.g., BLEU, CODEBLEU), there remains a deeper question about the extent of syntax knowledge they truly grasp when trained and fine-tuned for specific tasks.

To address this question, this thesis introduces a taxonomy of syntax …


Evaluating Large Language Model Performance On Haskell, Andrew Chen May 2024

Evaluating Large Language Model Performance On Haskell, Andrew Chen

Undergraduate Honors Theses

I introduce HaskellEval, a Haskell evaluation benchmark for Large Language Models. HaskellEval’s curation leverages a novel synthetic generation framework, streamlining the process of dataset curation by minimizing manual intervention. The core of this research is an extensive analysis of the trustworthiness of synthetic generations, ensuring accuracy, realism, and diversity. Additional, I provide a comprehensive evaluation of existing open-source models on HaskellEval.


Power Profiling Smart Home Devices, Kailai Cui May 2023

Power Profiling Smart Home Devices, Kailai Cui

Undergraduate Honors Theses

In recent years, the growing market for smart home devices has raised concerns about user privacy and security. Previous works have utilized power auditing measures to infer activity of IoT devices to mitigate security and privacy threats.

In this thesis, we explore the potential of extracting information from the power consumption traces of smart home devices. We present a framework that collects smart home devices’ power traces with current sensors and preprocesses them for effective inference. We collect an extensive dataset of duration > 2h from 6 devices including smart speakers, smart camera and smart display. We perform different classification tasks …


Kfactorvae: Self-Supervised Regularization For Better A.I. Disentanglement, Joseph S. Lee May 2023

Kfactorvae: Self-Supervised Regularization For Better A.I. Disentanglement, Joseph S. Lee

Undergraduate Honors Theses

Obtaining disentangled representations is a goal sought after to make A.I. models more interpretable. Studies have proven the impossibility of obtaining these kinds of representations with just unsupervised learning, or in other words, without strong inductive biases. One strong inductive bias is a regularization term that encourages the invariance of factors of variations across an image and a carefully selected augmentation. In this thesis, we build upon the existing Variational Autoencoder (VAE)-based disentanglement literature by utilizing the aforementioned inductive bias. We evaluate our method on the dSprites dataset, a well-known benchmark, and demonstrate its ability to achieve comparable or higher …


Identifying Social Media Users That Are Susceptible To Phishing Attacks, Zoe Metzger May 2023

Identifying Social Media Users That Are Susceptible To Phishing Attacks, Zoe Metzger

Undergraduate Honors Theses

Phishing scams are a billion-dollar problem. According to Threatpost, in 2020, business email compromise phishing attacks cost the US economy $ 1.8 billion. Social media phishing scams are also on the rise with 74% of companies experiencing social media attacks in 2021 according to Proofpoint. Educating users about phishing scams is an effective strategy for reducing phishing attacks. Despite efforts to combat phishing, the number of attacks continues to rise, likely indicative of a reticence of users to change online behaviors. Existing research into predicting vulnerable social media users that are susceptible to phishing mostly focuses on content analysis of …


Quantum Federated Learning: Training Hybrid Neural Networks Collaboratively, Anneliese Brei May 2022

Quantum Federated Learning: Training Hybrid Neural Networks Collaboratively, Anneliese Brei

Undergraduate Honors Theses

This thesis explores basic concepts of machine learning, neural networks, federated learning, and quantum computing in an effort to better understand Quantum Machine Learning, an emerging field of research. We propose Quantum Federated Learning (QFL), a schema for collaborative distributed learning that maintains privacy and low communication costs. We demonstrate the QFL framework and local and global update algorithms with implementations that utilize TensorFlow Quantum libraries. Our experiments test the effectiveness of frameworks of different sizes. We also test the effect of changing the number of training cycles and changing distribution of training data. This thesis serves as a synoptic …


Static And Dynamic Analysis In Cryptographic-Api Misuse Detection Of Mobile Application, Kunyang Li Dec 2021

Static And Dynamic Analysis In Cryptographic-Api Misuse Detection Of Mobile Application, Kunyang Li

Undergraduate Honors Theses

With Android devices becoming more advanced and gaining more popularity, the number of cryptographic-API misuses in mobile applications is escalating. Numerous snippets of code in Android are from Stack Overflow and over 90% of them contain several crypto-issues. Various crypto-misuse detectors come out aiming to report vulnerabilities of apps and better secure users’ privacy. These detectors can be broadly classified into two categories based on the analysis strategies employed to catch misuses – static analysis (i.e., by scanning the code base) and dynamic analysis (i.e., by executing the code). However, there are not enough research on comparing their underlying differences, …


A Pain Free Nociceptor: Predicting Football Injuries With Machine Learning, Andrew Lyubovsky May 2021

A Pain Free Nociceptor: Predicting Football Injuries With Machine Learning, Andrew Lyubovsky

Undergraduate Honors Theses

Injuries are a significant aspect of every sport, with the ability to impact a player’s career and the success of a team in their season. As sensor data is able to pick up on a player’s physical state, recently it has been analyzed for its ability to predict player injuries. We inspect the predictive power of player stats, subjective player responses, GPS data, and training load data in forecasting game injuries from an NCAA American football team during the 2019 season. Data processing techniques are used to remove noise and decrease correlated data, and as large portions of the data …


Molecular Cluster Fragment Machine Learning Training Techniques To Predict Energetics Of Brown Carbon Aerosol Clusters, Emily E. Chappie May 2021

Molecular Cluster Fragment Machine Learning Training Techniques To Predict Energetics Of Brown Carbon Aerosol Clusters, Emily E. Chappie

Undergraduate Honors Theses

Density functional theory (DFT) has become a popular method for computational work involving larger molecular systems as it provides accuracy that rivals ab initio methods while lowering computational cost. Nevertheless, computational cost is still high for systems greater than ten atoms in size, preventing their application in modeling realistic atmospheric systems at the molecular level. Machine learning techniques, however, show promise as cost-effective tools in predicting chemical properties when properly trained. In the interest of furthering chemical machine learning in the field of atmospheric science, I have developed a training method for predicting cluster energetics of newly characterized nitrogen-based brown …


Performance Implications Of Memory Affinity On Filesystem Caches In A Non-Uniform Memory Access Environment, Jacob Adams May 2021

Performance Implications Of Memory Affinity On Filesystem Caches In A Non-Uniform Memory Access Environment, Jacob Adams

Undergraduate Honors Theses

Non-Uniform Memory Access imposes unique challenges on every component of an operating system and the applications that run on it. One such component is the filesystem which, while not directly impacted by NUMA in most cases, typically has some form of cache whose performance is constrained by the latency and bandwidth of the memory that it is stored in. One such filesystem is ZFS, which contains its own custom caching system, known as the Adaptive Replacement Cache. This work looks at the impact of NUMA on this cache via sequential read operations, shows how current solutions intended to reduce this …


Scope: Building And Testing An Integrated Manual-Automated Event Extraction Tool For Online Text-Based Media Sources, Matthew Crittenden May 2021

Scope: Building And Testing An Integrated Manual-Automated Event Extraction Tool For Online Text-Based Media Sources, Matthew Crittenden

Undergraduate Honors Theses

Building on insights from two years of manually extracting events information from online news media, an interactive information extraction environment (IIEE) was developed. SCOPE, the Scientific Collection of Open-source Policy Evidence, is a Python Django-based tool divided across specialized modules for extracting structured events data from unstructured text. These modules are grouped into a flexible framework which enables the user to tailor the tool to meet their needs. Following principles of user-oriented learning for information extraction (IE), SCOPE offers an alternative approach to developing AI-assisted IE systems. In this piece, we detail the ongoing development of the SCOPE tool, present …