Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Data Science

William & Mary

Theses/Dissertations

2023

Articles 1 - 5 of 5

Full-Text Articles in Physical Sciences and Mathematics

Parameter Estimation For Patient Enrollment In Clinical Trials, Junyan Liu Dec 2023

Parameter Estimation For Patient Enrollment In Clinical Trials, Junyan Liu

Undergraduate Honors Theses

In this paper, we study the Poisson-gamma model for recruitment time in clinical trials. We proved several properties of this model that match our intuitions from a reliability perspective, did simulations on this model, and used different optimization methods to estimate the parameters. Although the behaviors of the optimization methods were unfavorable and unstable, we identified certain conditions and provided potential explanations for this phenomenon and further insights into the Poisson-gamma model.


Seeing What We Can't: Evaluating Implicit Biases In Deep Learning Satellite Imagery Models Trained For Poverty Prediction, Joseph O'Brien May 2023

Seeing What We Can't: Evaluating Implicit Biases In Deep Learning Satellite Imagery Models Trained For Poverty Prediction, Joseph O'Brien

Undergraduate Honors Theses

Previous studies have sought to use Convolutional Neural Networks for regional estimation of poverty levels. However, there is limited research into possible implicit biases in deep neural networks in the context of satellite imagery. In this work, we develop a deep learning model to predict the tertile of per-capita asset consumption, trained on satellite imagery and World Bank Living Standards Measurements Study data. Using satellite imagery collected via survey location data as inputs, we use transfer learning to train a VGG-16 Convolutional Neural Network to classify images based on per-capita consumption. The model achieves an $R^2$ of .74, using thousands …


A Satellite Imagery Approach To Estimating Migratory Flows In Guatemala Using Convolutional Neural Networks, Sarah Larimer May 2023

A Satellite Imagery Approach To Estimating Migratory Flows In Guatemala Using Convolutional Neural Networks, Sarah Larimer

Undergraduate Honors Theses

Being able to predict migratory flows is important in ensuring political, social, and economic stability. In the wake of violence, unrest, natural disasters, and social pressures, millions of mi- grants have fled Central America in search of a better life. However, due to the infrequent nature and high cost of census data, there is a need for a more remote and up to date approaches. Con- volutional Neural Networks offer a computer vision based approach that is cheaper and with significantly less lag. In this study, we seek to evaluate the effectiveness of different convolu- tional neural networks in predicting …


Identifying Social Media Users That Are Susceptible To Phishing Attacks, Zoe Metzger May 2023

Identifying Social Media Users That Are Susceptible To Phishing Attacks, Zoe Metzger

Undergraduate Honors Theses

Phishing scams are a billion-dollar problem. According to Threatpost, in 2020, business email compromise phishing attacks cost the US economy $ 1.8 billion. Social media phishing scams are also on the rise with 74% of companies experiencing social media attacks in 2021 according to Proofpoint. Educating users about phishing scams is an effective strategy for reducing phishing attacks. Despite efforts to combat phishing, the number of attacks continues to rise, likely indicative of a reticence of users to change online behaviors. Existing research into predicting vulnerable social media users that are susceptible to phishing mostly focuses on content analysis of …


Considering The Accuracy Of Fiat Boundaries: Ontology And Quantification, Lydia Troup May 2023

Considering The Accuracy Of Fiat Boundaries: Ontology And Quantification, Lydia Troup

Undergraduate Honors Theses

Administrative boundaries - i.e., states, counties, or districts - are fiat boundaries; they exist purely as defined by human interpretation. Because of this, and despite their critical importance to government functions, the accuracy of data products claiming to represent such boundaries is difficult to measure. Here, I explore this topic using three boundary data sets: the open source geoBoundaries data set, the humanitarian UN OCHA’s Common Operational Datasets (COD), and Esri’s commercial administrative divisions 0 and 1 data sets in the Living Atlas. The accuracy of each was quantified as the percent overlap between each data set and an authoritative …