Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Data Science

PDF

William & Mary

2023

Articles 1 - 7 of 7

Full-Text Articles in Entire DC Network

Parameter Estimation For Patient Enrollment In Clinical Trials, Junyan Liu Dec 2023

Parameter Estimation For Patient Enrollment In Clinical Trials, Junyan Liu

Undergraduate Honors Theses

In this paper, we study the Poisson-gamma model for recruitment time in clinical trials. We proved several properties of this model that match our intuitions from a reliability perspective, did simulations on this model, and used different optimization methods to estimate the parameters. Although the behaviors of the optimization methods were unfavorable and unstable, we identified certain conditions and provided potential explanations for this phenomenon and further insights into the Poisson-gamma model.


A Language Framework For Modeling Social Media Account Behavior, Alexander C. Nwala, Alessandro Flammini, Filippo Menczer Aug 2023

A Language Framework For Modeling Social Media Account Behavior, Alexander C. Nwala, Alessandro Flammini, Filippo Menczer

Arts & Sciences Articles

Malicious actors exploit social media to inflate stock prices, sway elections, spread misinformation, and sow discord. To these ends, they employ tactics that include the use of inauthentic accounts and campaigns. Methods to detect these abuses currently rely on features specifically designed to target suspicious behaviors. However, the effectiveness of these methods decays as malicious behaviors evolve. To address this challenge, we propose a language framework for modeling social media account behaviors. Words in this framework, called BLOC, consist of symbols drawn from distinct alphabets representing user actions and content. Languages from the framework are highly flexible and can be …


Seeing What We Can't: Evaluating Implicit Biases In Deep Learning Satellite Imagery Models Trained For Poverty Prediction, Joseph O'Brien May 2023

Seeing What We Can't: Evaluating Implicit Biases In Deep Learning Satellite Imagery Models Trained For Poverty Prediction, Joseph O'Brien

Undergraduate Honors Theses

Previous studies have sought to use Convolutional Neural Networks for regional estimation of poverty levels. However, there is limited research into possible implicit biases in deep neural networks in the context of satellite imagery. In this work, we develop a deep learning model to predict the tertile of per-capita asset consumption, trained on satellite imagery and World Bank Living Standards Measurements Study data. Using satellite imagery collected via survey location data as inputs, we use transfer learning to train a VGG-16 Convolutional Neural Network to classify images based on per-capita consumption. The model achieves an $R^2$ of .74, using thousands …


A Satellite Imagery Approach To Estimating Migratory Flows In Guatemala Using Convolutional Neural Networks, Sarah Larimer May 2023

A Satellite Imagery Approach To Estimating Migratory Flows In Guatemala Using Convolutional Neural Networks, Sarah Larimer

Undergraduate Honors Theses

Being able to predict migratory flows is important in ensuring political, social, and economic stability. In the wake of violence, unrest, natural disasters, and social pressures, millions of mi- grants have fled Central America in search of a better life. However, due to the infrequent nature and high cost of census data, there is a need for a more remote and up to date approaches. Con- volutional Neural Networks offer a computer vision based approach that is cheaper and with significantly less lag. In this study, we seek to evaluate the effectiveness of different convolu- tional neural networks in predicting …


Considering The Accuracy Of Fiat Boundaries: Ontology And Quantification, Lydia Troup May 2023

Considering The Accuracy Of Fiat Boundaries: Ontology And Quantification, Lydia Troup

Undergraduate Honors Theses

Administrative boundaries - i.e., states, counties, or districts - are fiat boundaries; they exist purely as defined by human interpretation. Because of this, and despite their critical importance to government functions, the accuracy of data products claiming to represent such boundaries is difficult to measure. Here, I explore this topic using three boundary data sets: the open source geoBoundaries data set, the humanitarian UN OCHA’s Common Operational Datasets (COD), and Esri’s commercial administrative divisions 0 and 1 data sets in the Living Atlas. The accuracy of each was quantified as the percent overlap between each data set and an authoritative …


Identifying Social Media Users That Are Susceptible To Phishing Attacks, Zoe Metzger May 2023

Identifying Social Media Users That Are Susceptible To Phishing Attacks, Zoe Metzger

Undergraduate Honors Theses

Phishing scams are a billion-dollar problem. According to Threatpost, in 2020, business email compromise phishing attacks cost the US economy $ 1.8 billion. Social media phishing scams are also on the rise with 74% of companies experiencing social media attacks in 2021 according to Proofpoint. Educating users about phishing scams is an effective strategy for reducing phishing attacks. Despite efforts to combat phishing, the number of attacks continues to rise, likely indicative of a reticence of users to change online behaviors. Existing research into predicting vulnerable social media users that are susceptible to phishing mostly focuses on content analysis of …


Predicting Micronutrient Deficiency With Publicly Available Satellite Data, Elizabeth Bondi-Kelly, Haipeng Chen, Christopher D. Golden, Nikhil Behari, Milind Tambe Mar 2023

Predicting Micronutrient Deficiency With Publicly Available Satellite Data, Elizabeth Bondi-Kelly, Haipeng Chen, Christopher D. Golden, Nikhil Behari, Milind Tambe

Arts & Sciences Articles

Micronutrient deficiency (MND), which is a form of malnutrition that can have serious health consequences, is difficult to diagnose in early stages without blood draws, which are expensive and time-consuming to collect and process. It is even more difficult at a public health scale seeking to identify regions at higher risk of MND. To provide data more widely and frequently, we propose an accurate, scalable, low-cost, and interpretable regional-level MND prediction system. Specifically, our work is the first to use satellite data, such as forest cover, weather, and presence of water, to predict deficiency of micronutrients such as iron, Vitamin …