Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 19 of 19

Full-Text Articles in Physical Sciences and Mathematics

Code Syntax Understanding In Large Language Models, Cole Granger May 2024

Code Syntax Understanding In Large Language Models, Cole Granger

Undergraduate Honors Theses

In recent years, tasks for automated software engineering have been achieved using Large Language Models trained on source code, such as Seq2Seq, LSTM, GPT, T5, BART and BERT. The inherent textual nature of source code allows it to be represented as a sequence of sub-words (or tokens), drawing parallels to prior work in NLP. Although these models have shown promising results according to established metrics (e.g., BLEU, CODEBLEU), there remains a deeper question about the extent of syntax knowledge they truly grasp when trained and fine-tuned for specific tasks.

To address this question, this thesis introduces a taxonomy of syntax …


Security And Interpretability In Large Language Models, Lydia Danas May 2024

Security And Interpretability In Large Language Models, Lydia Danas

Undergraduate Honors Theses

Large Language Models (LLMs) have the capability to model long-term dependencies in sequences of tokens, and are consequently often utilized to generate text through language modeling. These capabilities are increasingly being used for code generation tasks; however, LLM-powered code generation tools such as GitHub's Copilot have been generating insecure code and thus pose a cybersecurity risk. To generate secure code we must first understand why LLMs are generating insecure code. This non-trivial task can be realized through interpretability methods, which investigate the hidden state of a neural network to explain model outputs. A new interpretability method is rationales, which obtains …


Roads And Corresponding Travel Time To Markets: Assessing Climate Vulnerability In Nepal, Kaitlyn Crowley May 2024

Roads And Corresponding Travel Time To Markets: Assessing Climate Vulnerability In Nepal, Kaitlyn Crowley

Undergraduate Honors Theses

Roads exist as a physical and theoretical connection between people and places around the globe. In addition to providing a route from one point to another, roads are also an indicator of access to markets and of poverty. However, current road datasets, particularly the Global Roads Open Access Data Set, are out of date or incomplete, necessitating new sources of data for analyses involving road networks. This study explores the relationship between climate change and access to markets in Nepal. We seek to identify isolated communities that are likely to experience detrimental outcomes associated with environmental threats, such as increasing …


Parameter Estimation For Patient Enrollment In Clinical Trials, Junyan Liu Dec 2023

Parameter Estimation For Patient Enrollment In Clinical Trials, Junyan Liu

Undergraduate Honors Theses

In this paper, we study the Poisson-gamma model for recruitment time in clinical trials. We proved several properties of this model that match our intuitions from a reliability perspective, did simulations on this model, and used different optimization methods to estimate the parameters. Although the behaviors of the optimization methods were unfavorable and unstable, we identified certain conditions and provided potential explanations for this phenomenon and further insights into the Poisson-gamma model.


Exploration And Statistical Modeling Of Profit, Caleb Gibson Dec 2023

Exploration And Statistical Modeling Of Profit, Caleb Gibson

Undergraduate Honors Theses

For any company involved in sales, maximization of profit is the driving force that guides all decision-making. Many factors can influence how profitable a company can be, including external factors like changes in inflation or consumer demand or internal factors like pricing and product cost. Understanding specific trends in one's own internal data, a company can readily identify problem areas or potential growth opportunities to help increase profitability.

In this discussion, we use an extensive data set to examine how a company might analyze their own data to identify potential changes the company might investigate to drive better performance. Based …


Algorithmic Bias: Causes And Effects On Marginalized Communities, Katrina M. Baha May 2023

Algorithmic Bias: Causes And Effects On Marginalized Communities, Katrina M. Baha

Undergraduate Honors Theses

Individuals from marginalized backgrounds face different healthcare outcomes due to algorithmic bias in the technological healthcare industry. Algorithmic biases, which are the biases that arise from the set of steps used to solve or analyze a problem, are evident when people from marginalized communities use healthcare technology. For example, many pulse oximeters, which are the medical devices used to measure oxygen saturation in the blood, are not able to accurately read people who have darker skin tones. Thus, people with darker skin tones are not able to receive proper health care due to their pulse oximetry data being inaccurate. This …


Seeing What We Can't: Evaluating Implicit Biases In Deep Learning Satellite Imagery Models Trained For Poverty Prediction, Joseph O'Brien May 2023

Seeing What We Can't: Evaluating Implicit Biases In Deep Learning Satellite Imagery Models Trained For Poverty Prediction, Joseph O'Brien

Undergraduate Honors Theses

Previous studies have sought to use Convolutional Neural Networks for regional estimation of poverty levels. However, there is limited research into possible implicit biases in deep neural networks in the context of satellite imagery. In this work, we develop a deep learning model to predict the tertile of per-capita asset consumption, trained on satellite imagery and World Bank Living Standards Measurements Study data. Using satellite imagery collected via survey location data as inputs, we use transfer learning to train a VGG-16 Convolutional Neural Network to classify images based on per-capita consumption. The model achieves an $R^2$ of .74, using thousands …


A Satellite Imagery Approach To Estimating Migratory Flows In Guatemala Using Convolutional Neural Networks, Sarah Larimer May 2023

A Satellite Imagery Approach To Estimating Migratory Flows In Guatemala Using Convolutional Neural Networks, Sarah Larimer

Undergraduate Honors Theses

Being able to predict migratory flows is important in ensuring political, social, and economic stability. In the wake of violence, unrest, natural disasters, and social pressures, millions of mi- grants have fled Central America in search of a better life. However, due to the infrequent nature and high cost of census data, there is a need for a more remote and up to date approaches. Con- volutional Neural Networks offer a computer vision based approach that is cheaper and with significantly less lag. In this study, we seek to evaluate the effectiveness of different convolu- tional neural networks in predicting …


Identifying Social Media Users That Are Susceptible To Phishing Attacks, Zoe Metzger May 2023

Identifying Social Media Users That Are Susceptible To Phishing Attacks, Zoe Metzger

Undergraduate Honors Theses

Phishing scams are a billion-dollar problem. According to Threatpost, in 2020, business email compromise phishing attacks cost the US economy $ 1.8 billion. Social media phishing scams are also on the rise with 74% of companies experiencing social media attacks in 2021 according to Proofpoint. Educating users about phishing scams is an effective strategy for reducing phishing attacks. Despite efforts to combat phishing, the number of attacks continues to rise, likely indicative of a reticence of users to change online behaviors. Existing research into predicting vulnerable social media users that are susceptible to phishing mostly focuses on content analysis of …


Considering The Accuracy Of Fiat Boundaries: Ontology And Quantification, Lydia Troup May 2023

Considering The Accuracy Of Fiat Boundaries: Ontology And Quantification, Lydia Troup

Undergraduate Honors Theses

Administrative boundaries - i.e., states, counties, or districts - are fiat boundaries; they exist purely as defined by human interpretation. Because of this, and despite their critical importance to government functions, the accuracy of data products claiming to represent such boundaries is difficult to measure. Here, I explore this topic using three boundary data sets: the open source geoBoundaries data set, the humanitarian UN OCHA’s Common Operational Datasets (COD), and Esri’s commercial administrative divisions 0 and 1 data sets in the Living Atlas. The accuracy of each was quantified as the percent overlap between each data set and an authoritative …


Using Deep Learning With Satellite Imagery To Estimate Deforestation Rates, Maeve Naughton-Rockwell May 2022

Using Deep Learning With Satellite Imagery To Estimate Deforestation Rates, Maeve Naughton-Rockwell

Undergraduate Honors Theses

Previous studies have used Convolutional Neural Networks for regional detection of deforestation breaks. However, there is limited research into the capability of deep neural networks to identify sudden shifts in global forest cover from satellite imagery. Additionally, many deforestation detection models are trained on region specific data and need manual input thresholds. In this work, we develop a deep learning model to predict the percent of deforestation in a region between two points in time, trained on globally sourced data. Using the before and after satellite images of a deforestation event as inputs, we implemented a two input Convolutional Neural …


Intraday Algorithmic Trading Using Momentum And Long Short-Term Memory Network Strategies, Andrew R. Whitinger Ii May 2022

Intraday Algorithmic Trading Using Momentum And Long Short-Term Memory Network Strategies, Andrew R. Whitinger Ii

Undergraduate Honors Theses

Intraday stock trading is an infamously difficult and risky strategy. Momentum and reversal strategies and long short-term memory (LSTM) neural networks have been shown to be effective for selecting stocks to buy and sell over time periods of multiple days. To explore whether these strategies can be effective for intraday trading, their implementations were simulated using intraday price data for stocks in the S&P 500 index, collected at 1-second intervals between February 11, 2021 and March 9, 2021 inclusive. The study tested 160 variations of momentum and reversal strategies for profitability in long, short, and market-neutral portfolios, totaling 480 portfolios. …


Using A Machine Learning Model To Predict Plant Inflorescences Based Upon Its Soil Microbiome, Luke Denoncourt May 2022

Using A Machine Learning Model To Predict Plant Inflorescences Based Upon Its Soil Microbiome, Luke Denoncourt

Undergraduate Honors Theses

The UN estimates that the global population could reach 9.7 billion by 2050 (United Nations). As a result, the amount of food required to feed humanity is thought to double by 2050 (Ray et al., 2012). Humanity must find a way to increase crop production without increasing fertilizer usage and eutrophication, which can be done using the soil microbiome. Using potted plants with soils inoculated with Pseudomonas alcaligenes, Pseudomonas denitrificans, Bacillus polymyxa, and Mycobacterium phlei, both the shoot and root growth of pea and cotton plants was significantly increased (Egamberdieva & Höflich, 2004). In this study, utilizing a random forest …


The Pandemic From Above: Estimating Covid-19 Cases Using Deep Learning And Satellite Imagery, John Hennin Apr 2022

The Pandemic From Above: Estimating Covid-19 Cases Using Deep Learning And Satellite Imagery, John Hennin

Undergraduate Honors Theses

Monitoring the spread of an outbreak of disease (such as COVID-19) is an important component of any coordinated pandemic response. Across the globe, our ability to conduct such monitoring - especially at early stages of the COVID- 19 pandemic - was highly limited due to a lack of public reporting mechanisms. Today, the process of case data collection remains expensive and, in some regions, is subject to political considerations. Researchers have turned to some techniques leveraging Google Trends and Twitter data to overcome limitations in public data sources. Here, we provide another approach which leverages satellite information to provide estimates …


Machine Learning In Healthcare: Improving The Diagnosis Of Pulmonary Embolism In Covid-19 Patients, Soheb Osmani Apr 2022

Machine Learning In Healthcare: Improving The Diagnosis Of Pulmonary Embolism In Covid-19 Patients, Soheb Osmani

Undergraduate Honors Theses

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has created new challenges for clinicians diagnosing pulmonary embolism (PE). Clinicians currently rely on D-Dimer levels in conjunction with clinical prediction scores to rule out and diagnose PE. However, patients with COVID-19 (the disease caused by SARS-CoV-2) often present with elevated D-Dimer levels. D-Dimer levels in COVID-19 patients have been found to be positively correlated with the severity of disease. Symptoms of COVID-19 also often align with symptoms of PE. Therefore, it becomes more difficult for clinicians to identify which COVID-19 positive patients should undergo further testing for PE. This study evaluates …


Molecular Cluster Fragment Machine Learning Training Techniques To Predict Energetics Of Brown Carbon Aerosol Clusters, Emily E. Chappie May 2021

Molecular Cluster Fragment Machine Learning Training Techniques To Predict Energetics Of Brown Carbon Aerosol Clusters, Emily E. Chappie

Undergraduate Honors Theses

Density functional theory (DFT) has become a popular method for computational work involving larger molecular systems as it provides accuracy that rivals ab initio methods while lowering computational cost. Nevertheless, computational cost is still high for systems greater than ten atoms in size, preventing their application in modeling realistic atmospheric systems at the molecular level. Machine learning techniques, however, show promise as cost-effective tools in predicting chemical properties when properly trained. In the interest of furthering chemical machine learning in the field of atmospheric science, I have developed a training method for predicting cluster energetics of newly characterized nitrogen-based brown …


Scope: Building And Testing An Integrated Manual-Automated Event Extraction Tool For Online Text-Based Media Sources, Matthew Crittenden May 2021

Scope: Building And Testing An Integrated Manual-Automated Event Extraction Tool For Online Text-Based Media Sources, Matthew Crittenden

Undergraduate Honors Theses

Building on insights from two years of manually extracting events information from online news media, an interactive information extraction environment (IIEE) was developed. SCOPE, the Scientific Collection of Open-source Policy Evidence, is a Python Django-based tool divided across specialized modules for extracting structured events data from unstructured text. These modules are grouped into a flexible framework which enables the user to tailor the tool to meet their needs. Following principles of user-oriented learning for information extraction (IE), SCOPE offers an alternative approach to developing AI-assisted IE systems. In this piece, we detail the ongoing development of the SCOPE tool, present …


Use Of Lymesim 2.0 To Assess The Potential For Single And Integrated Management Methods To Control Blacklegged Ticks (Ixodes Scapularis; Acari: Ixodidae) And Transmission Of Lyme Disease Spirochetes, Shravani Chitineni, Elizabeth R. Gleim, Holly D. Gaff Jan 2021

Use Of Lymesim 2.0 To Assess The Potential For Single And Integrated Management Methods To Control Blacklegged Ticks (Ixodes Scapularis; Acari: Ixodidae) And Transmission Of Lyme Disease Spirochetes, Shravani Chitineni, Elizabeth R. Gleim, Holly D. Gaff

Undergraduate Honors Theses

Annual Lyme disease cases continue to rise in the U.S. making it the most reported vector-borne illness in the country. The pathogen (Borrelia burgdorferi) and primary vector (Ixodes scapularis; blacklegged tick) dynamics of Lyme disease are complicated by the multitude of vertebrate hosts and varying environmental factors, making models an ideal tool for exploring disease dynamics in a time- and cost-effective way. In the current study, LYMESIM 2.0, a mechanistic model, was used to explore the effectiveness of three commonly used tick control methods: habitat-targeted acaricide (spraying), rodent-targeted acaricide (bait boxes), and white-tailed deer targeted acaricide (4-poster …


An Analytical Examination On The Effects Of Vegetarian And Omnivorous Diets On C-Reactive Protein, Aletha Kleis Jan 2020

An Analytical Examination On The Effects Of Vegetarian And Omnivorous Diets On C-Reactive Protein, Aletha Kleis

Undergraduate Honors Theses

There is a lack of research regarding how following a vegetarian or omnivores diet effects C-Reactive Protein (CRP) levels of people as seen through results from an analysis of data gathered from the National Health and Nutrition Examination Survey (NHANES). The level of CRP is a reflection of how much inflammation there is in one’s body and is a popular indicator of risk for heart disease. Thus, in this research, I use the NHANES data to look at the relationship of CRP levels of people who identified themselves as vegetarian or not, while also considering the general healthiness of each …