Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 44

Full-Text Articles in Physical Sciences and Mathematics

The Effect Of Age, Syntax Complexity, And Cognitive Ability On The Rate Of Semantic Illusions, Sara Anne Goring Jan 2023

The Effect Of Age, Syntax Complexity, And Cognitive Ability On The Rate Of Semantic Illusions, Sara Anne Goring

CGU Theses & Dissertations

Semantic illusions are recognition errors that occur when an individual fails to notice that information contradicts their prior knowledge (Barton & Sanford, 1993; Erickson & Mattson, 1981). For example, after hearing the question, “If a plane crashes while flying over state lines, where should the survivors be buried?” many start to consider the legality or appropriateness of the scenario despite knowing “survivors” should not be buried. Having more knowledge does not necessarily prevent individuals from overlooking illusory information/misinformation. Older adults tend to have greater crystallized intelligence than young adults, yet these age groups appear to detect illusory information at equivalent …


Application Of Sentiment Analysis And Machine Learning Techniques To Predict Daily Cryptocurrency Price Returns, Edward Wu Jan 2023

Application Of Sentiment Analysis And Machine Learning Techniques To Predict Daily Cryptocurrency Price Returns, Edward Wu

CMC Senior Theses

This paper examines the effects of social media sentiment relating to Bitcoin on the daily price returns of Bitcoin and other popular cryptocurrencies by utilizing sentiment analysis and machine learning techniques to predict daily price returns. Many investors think that social media sentiment affects cryptocurrency prices. However, the results of this paper find that social media sentiment relating to Bitcoin does not add significant predictive value to forecasting daily price returns for each of the six cryptocurrencies used for analysis and that machine learning models that do not assume linearity between the current day price return and previous daily price …


Beginner's Analysis Of Financial Stochastic Process Models, David Garcia Jan 2023

Beginner's Analysis Of Financial Stochastic Process Models, David Garcia

HMC Senior Theses

This thesis explores the use of geometric Brownian motion (GBM) as a financial model for predicting stock prices. The model is first introduced and its assumptions and limitations are discussed. Then, it is shown how to simulate GBM in order to predict stock price values. The performance of the GBM model is then evaluated in two different periods of time to determine whether it's accuracy has changed before and after March 23, 2020.


Using Short Bursts To Optimize Redistricting In Georgia, Vedika Vishweshwar Jan 2022

Using Short Bursts To Optimize Redistricting In Georgia, Vedika Vishweshwar

CMC Senior Theses

Identifying extreme outliers in large state spaces is a difficult prob-
lem. I consider this problem in the context of finding political district-
ing plans that maximize the number of districts in which the majority
of the population is from a minority group, such as African Americans.
Since the set of all possible districting plans is enormous and unfeasi-
ble to examine in practice, this paper proposes a sampling method to
find these outlying plans. Specifically, this paper experiments with short
bursts in the context of minority voting rights in Georgia. Short bursts
are a type of Markov Chain in …


Containing Compounding Container Congestion, Curtis Salinger Jan 2022

Containing Compounding Container Congestion, Curtis Salinger

CMC Senior Theses

The Covid-19 pandemic caused major disruptions throughout the container shipping supply chain. Professor Dongping Song of Liverpool University wrote a paper discussing the logistical vulnerabilities in the supply chain, including the issue of congestion in ports. This paper examines the Port of Los Angeles from 2018-2021 as it relates to Song’s paper to see how its operations were impacted during the Covid-19 timeframe. It is found that labor shortages, chassis shortages, and change in trade behavior each contributed to the congestion. Unfortunately, the implemented policies were insufficient to bolster the port against sustained challenges and congestion continues to worsen.


A Gender And Race Theoretical And Probabilistic Analysis Of The Recent Title Ix Policy Changes, Jordan Wellington Jan 2021

A Gender And Race Theoretical And Probabilistic Analysis Of The Recent Title Ix Policy Changes, Jordan Wellington

Scripps Senior Theses

On May 6th, 2020, after extensive public comment and review, the Department of Education published the final rule for the new Title IX regulations, which took effect in schools on August 14th. Title IX is the nearly fifty year old piece of the Education Amendments that prohibits sexual discrimination in federally funded schools. Several of these changes, such as the inclusion of live hearings and cross examination of witnesses, have been widely criticized by victims’ rights advocates for potentially retraumatizing victims of sexual assault and discouraging students from pursuing a Title IX claim. While the impact of the new regulations …


Uncovering Object Categories In Infant Views, Naiti S. Bhatt Jan 2021

Uncovering Object Categories In Infant Views, Naiti S. Bhatt

Scripps Senior Theses

While adults recognize objects in a near-instant, infants must learn how to categorize the objects in their visual environments. Recent work has shown that egocentric head-mounted camera videos contain rich data that illuminate the infant experience (Clerkin et al., 2017; Franchak et al., 2011; Yoshida & Smith, 2008). While past work has focused on the social information in view, in this work, we aim to characterize the objects in infants’ at-home visual environments by modifying modern computer vision models for the infant view. To do so, we collected manual annotations of objects that infants seemed to be interacting within a …


Feature Investigation For Stock Returns Prediction Using Xgboost And Deep Learning Sentiment Classification, Seungho (Samuel) Lee Jan 2021

Feature Investigation For Stock Returns Prediction Using Xgboost And Deep Learning Sentiment Classification, Seungho (Samuel) Lee

CMC Senior Theses

This paper attempts to quantify predictive power of social media sentiment and financial data in stock prediction by utilizing a comprehensive set of stock-related fundamental and technical variables and social media sentiments. For conducting sentiment analysis, this study employs a pretrained finBERT model that provides three different sentiment classifications and respective softmax scores. Hence, the significance of these variables is evaluated with XGBoost regression and Shapley Additive exPlanations (SHAP) frameworks. Through investigating feature importance, this study finds that statistical properties of sentiment variables provide a stronger predictive power than a weighted sentiment score and that it is possible to quantify …


An Evaluation Of Knot Placement Strategies For Spline Regression, William Klein Jan 2021

An Evaluation Of Knot Placement Strategies For Spline Regression, William Klein

CMC Senior Theses

Regression splines have an established value for producing quality fit at a relatively low-degree polynomial. This paper explores the implications of adopting new methods for knot selection in tandem with established methodology from the current literature. Structural features of generated datasets, as well as residuals collected from sequential iterative models are used to augment the equidistant knot selection process. From analyzing a simulated dataset and an application onto the Racial Animus dataset, I find that a B-spline basis paired with equally-spaced knots remains the best choice when data are evenly distributed, even when structural features of a dataset are known …


Information Prioritization: A Comparison Between Utility Maximizers And Probability Matchers, Yusuf Ismaeel Jan 2021

Information Prioritization: A Comparison Between Utility Maximizers And Probability Matchers, Yusuf Ismaeel

CMC Senior Theses

This thesis examines the differences between probability matchers and utility maximizers in their preferences for information sources in a lab environment. In this paper, we consider the best source of information to be the most connected one. We conducted several linear probability model type regressions along with logit regressions. Furthermore, we also attempted to control and fix any potential misclassifications in classifying the cognitive strategy by using instrumental variables. The results show that utility maximizers will almost always choose the most informed node. Probability matchers, on the other hand, do not exhibit such a behavior as the probability matching strategy …


Using Twitter Api To Solve The Goat Debate: Michael Jordan Vs. Lebron James, Jordan Trey Leonard Jan 2021

Using Twitter Api To Solve The Goat Debate: Michael Jordan Vs. Lebron James, Jordan Trey Leonard

CMC Senior Theses

Using a Twitter API, I gather and analyze tweets by performing sentiment analysis to solve the GOAT debate among professional athletes with the primary focus on comparing Michael Jordan and LeBron James. Athletes from the National Football League (NFL), the National Basketball Association (NBA), Major League Baseball (MLB), and the National Collegiate Athletic Association (NCAA) Division 1 Men's and Women's Basketball were selected to compare how sentiment polarity varies across sports. Sentiment polarity is measured by labeling text as "positive", "neutral", or "negative" which allows us to determine which athlete/sport is highly favored among the Twitter community when it comes …


Neither “Post-War” Nor Post-Pregnancy Paranoia: How America’S War On Drugs Continues To Perpetuate Disparate Incarceration Outcomes For Pregnant, Substance-Involved Offenders, Becca S. Zimmerman Jan 2021

Neither “Post-War” Nor Post-Pregnancy Paranoia: How America’S War On Drugs Continues To Perpetuate Disparate Incarceration Outcomes For Pregnant, Substance-Involved Offenders, Becca S. Zimmerman

Pitzer Senior Theses

This thesis investigates the unique interactions between pregnancy, substance involvement, and race as they relate to the War on Drugs and the hyper-incarceration of women. Using ordinary least square regression analyses and data from the Bureau of Justice Statistics’ 2016 Survey of Prison Inmates, I examine if (and how) pregnancy status, drug use, race, and their interactions influence two length of incarceration outcomes: sentence length and amount of time spent in jail between arrest and imprisonment. The results collectively indicate that pregnancy decreases length of incarceration outcomes for those offenders who are not substance-involved but not evenhandedly -- benefitting white …


Novel Random Forest Methods And Algorithms For Autism Spectrum Disorders Research, Afrooz Jahedi Jan 2020

Novel Random Forest Methods And Algorithms For Autism Spectrum Disorders Research, Afrooz Jahedi

CGU Theses & Dissertations

Random Forest (RF) is a flexible, easy to use machine learning algorithm that was proposed by Leo Breiman in 2001 for building a predictor ensemble with a set of decision trees that grow in randomly selected subspaces of data. Its superior prediction accuracy has made it the most used algorithms in the machine learning field. In this dissertation, we use the random forest as the main building block for creating a proximity matrix for multivariate matching and diagnostic classification problems that are used for autism research (as an exemplary application). In observational studies, matching is used to optimize the balance …


A Multinational Study Of The Etiology And Clinical Teleology Of Moral Evaluations Of Patient Behaviors, Anna Yu Lee Jan 2020

A Multinational Study Of The Etiology And Clinical Teleology Of Moral Evaluations Of Patient Behaviors, Anna Yu Lee

CGU Theses & Dissertations

This dissertation is a collection of four studies which collectively explore a hypothesized construct of ‘moral evaluation of patient behaviors’ (MEPB) as a driver of health professionals’ readiness to interact humanistically with their patients. In these studies, ‘humanistic interactions’ refer to the non-technical, intangible skills and factors of clinical competence; the factors specifically explored in these studies were compassion toward patients, self-efficacy for treating patients, and optimism toward patient treatment. For the purpose of specificity, all factors were examined as they pertained to patients with substance use disorders. Survey data from a convenience sample of 524 health professionals (i.e. physicians, …


How Machine Learning And Probability Concepts Can Improve Nba Player Evaluation, Harrison Miller Jan 2020

How Machine Learning And Probability Concepts Can Improve Nba Player Evaluation, Harrison Miller

CMC Senior Theses

In this paper I will be breaking down a scholarly article, written by Sameer K. Deshpande and Shane T. Jensen, that proposed a new method to evaluate NBA players. The NBA is the highest level professional basketball league in America and stands for the National Basketball Association. They proposed to build a model that would result in how NBA players impact their teams chances of winning a game, using machine learning and probability concepts. I preface that by diving into these concepts and their mathematical backgrounds. These concepts include building a linear model using ordinary least squares method, the bias …


Causal Effect Random Forest Of Interaction Trees For Learning Individualized Treatment Regimes In Observational Studies: With Applications To Education Study Data, Luo Li Jan 2020

Causal Effect Random Forest Of Interaction Trees For Learning Individualized Treatment Regimes In Observational Studies: With Applications To Education Study Data, Luo Li

CGU Theses & Dissertations

Learning individualized treatment regimes (ITR) using observational data holds great interest in various fields, as treatment recommendations based on individual characteristics may improve individual treatment benefits with a reduced cost. It has long been observed that different individuals may respond to a certain treatment with significant heterogeneity. ITR can be defined as a mapping between individual characteristics to a treatment assignment. The optimal ITR is the treatment assignment that maximizes expected individual treatment effects. Rooted from personalized medicine, many studies and applications of ITR are in medical fields and clinical practice. Heterogeneous responses are also well documented in educational interventions. …


K-Means Stock Clustering Analysis Based On Historical Price Movements And Financial Ratios, Shu Bin Jan 2020

K-Means Stock Clustering Analysis Based On Historical Price Movements And Financial Ratios, Shu Bin

CMC Senior Theses

The 2015 article Creating Diversified Portfolios Using Cluster Analysis proposes an algorithm that uses the Sharpe ratio and results from K-means clustering conducted on companies' historical financial ratios to generate stock market portfolios. This project seeks to evaluate the performance of the portfolio-building algorithm during the beginning period of the COVID-19 recession. S&P 500 companies' historical stock price movement and their historical return on assets and asset turnover ratios are used as dissimilarity metrics for K-means clustering. After clustering, stock with the highest Sharpe ratio from each cluster is picked to become a part of the portfolio. The economic and …


Bayesian Hierarchical Meta-Analysis Of Asymptomatic Ebola Seroprevalence, Peter Brody-Moore Jan 2019

Bayesian Hierarchical Meta-Analysis Of Asymptomatic Ebola Seroprevalence, Peter Brody-Moore

CMC Senior Theses

The continued study of asymptomatic Ebolavirus infection is necessary to develop a more complete understanding of Ebola transmission dynamics. This paper conducts a meta-analysis of eight studies that measure seroprevalence (the number of subjects that test positive for anti-Ebolavirus antibodies in their blood) in subjects with household exposure or known case-contact with Ebola, but that have shown no symptoms. In our two random effects Bayesian hierarchical models, we find estimated seroprevalences of 8.76% and 9.72%, significantly higher than the 3.3% found by a previous meta-analysis of these eight studies. We also produce a variation of this meta-analysis where we exclude …


Using Neural Networks To Classify Discrete Circular Probability Distributions, Madelyn Gaumer Jan 2019

Using Neural Networks To Classify Discrete Circular Probability Distributions, Madelyn Gaumer

HMC Senior Theses

Given the rise in the application of neural networks to all sorts of interesting problems, it seems natural to apply them to statistical tests. This senior thesis studies whether neural networks built to classify discrete circular probability distributions can outperform a class of well-known statistical tests for uniformity for discrete circular data that includes the Rayleigh Test1, the Watson Test2, and the Ajne Test3. Each neural network used is relatively small with no more than 3 layers: an input layer taking in discrete data sets on a circle, a hidden layer, and an output …


Snap Scholar: The User Experience Of Engaging With Academic Research Through A Tappable Stories Medium, Ieva Burk Jan 2019

Snap Scholar: The User Experience Of Engaging With Academic Research Through A Tappable Stories Medium, Ieva Burk

CMC Senior Theses

With the shift to learn and consume information through our mobile devices, most academic research is still only presented in long-form text. The Stanford Scholar Initiative has explored the segment of content creation and consumption of academic research through video. However, there has been another popular shift in presenting information from various social media platforms and media outlets in the past few years. Snapchat and Instagram have introduced the concept of tappable “Stories” that have gained popularity in the realm of content consumption.

To accelerate the growth of the creation of these research talks, I propose an alternative to video: …


On Cluster Robust Models, José Bayoán Santiago Calderón Jan 2019

On Cluster Robust Models, José Bayoán Santiago Calderón

CGU Theses & Dissertations

Cluster robust models are a kind of statistical models that attempt to estimate parameters considering potential heterogeneity in treatment effects. Absent heterogeneity in treatment effects, the partial and average treatment effect are the same. When heterogeneity in treatment effects occurs, the average treatment effect is a function of the various partial treatment effects and the composition of the population of interest. The first chapter explores the performance of common estimators as a function of the presence of heterogeneity in treatment effects and other characteristics that may influence their performance for estimating average treatment effects. The second chapter examines various approaches …


A Tacticians Guide To Conflict, Vol. 1: Advancing Explanations & Predictions Of Intrastate Conflict, Khaled Eid Jan 2019

A Tacticians Guide To Conflict, Vol. 1: Advancing Explanations & Predictions Of Intrastate Conflict, Khaled Eid

CGU Theses & Dissertations

Intrastate conflict is an ever-evolving problem – causes, explanation, and predictions are increasingly murky as traditional methods of analysis focus on structural issues as precursors of conflict. Often times these theories do not consider the underlying meso and micro dynamics that can provide vital insights into the phenomena. Tactical decision-makers are left using models that rely on highly aggregated, country level data to create proper courses of actions (COAs) to address or predict conflict. The shortcoming is that conflicts morph quite rapidly and structural variables can struggle capture such dynamic changes. To address this some tacticians are using big data …


Iterative Matrix Factorization Method For Social Media Data Location Prediction, Natchanon Suaysom Jan 2018

Iterative Matrix Factorization Method For Social Media Data Location Prediction, Natchanon Suaysom

HMC Senior Theses

Since some of the location of where the users posted their tweets collected by social media company have varied accuracy, and some are missing. We want to use those tweets with highest accuracy to help fill in the data of those tweets with incomplete information. To test our algorithm, we used the sets of social media data from a city, we separated them into training sets, where we know all the information, and the testing sets, where we intentionally pretend to not know the location. One prediction method that was used in (Dukler, Han and Wang, 2016) requires appending one-hot …


Sequential Probing With A Random Start, Joshua Miller Jan 2018

Sequential Probing With A Random Start, Joshua Miller

HMC Senior Theses

Processing user requests quickly requires not only fast servers, but also demands methods to quickly locate idle servers to process those requests. Methods of finding idle servers are analogous to open addressing in hash tables, but with the key difference that servers may return to an idle state after having been busy rather than staying busy. Probing sequences for open addressing are well-studied, but algorithms for locating idle servers are less understood. We investigate sequential probing with a random start as a method for finding idle servers, especially in cases of heavy traffic. We present a procedure for finding the …


Predictive Golf Analytics Versus The Daily Fantasy Sports Market, John O'Malley Jan 2018

Predictive Golf Analytics Versus The Daily Fantasy Sports Market, John O'Malley

CMC Senior Theses

This study examines the different skills necessary for PGA tour players to succeed at specific annual tournaments, in order to create a predictive model for DraftKings PGA contests. The model takes into account data from the PGA Tour ShotLink Intelligence Program. The predictive model is created each week based on past results from the specific tournament in question, with the hope of predicting a group of twenty-five players who should be successful based on their statistical profile. The results of the model are detailed in this paper, which covers the first nine weeks of the 2017 PGA Tour season, with …


Step-Selection Functions For Modeling Animal Movement -- Case Study: African Buffalo, Maia Adar Jan 2018

Step-Selection Functions For Modeling Animal Movement -- Case Study: African Buffalo, Maia Adar

CMC Senior Theses

Understanding what factors influence wildlife movement allows landscape planners to make informed decisions that benefit both animals and humans. New quantitative methods, such as step-selection functions, provide valuable objective analyses of wildlife connectivity. This paper provides a framework for creating a step-selection function and demonstrates its use in a case study. The first section provides a general introduction about wildlife connectivity research. The second section explains the math behind the step-selection function using a simple example. The last section gives the results of a step-selection model for African buffalo in the Kavango Zambezi Transfrontier Conservation Area. Buffalo were found to …


A New Approximation Scheme For Monte Carlo Applications, Bo Jones Jan 2017

A New Approximation Scheme For Monte Carlo Applications, Bo Jones

CMC Senior Theses

Approximation algorithms employing Monte Carlo methods, across application domains, often require as a subroutine the estimation of the mean of a random variable with support on [0,1]. One wishes to estimate this mean to within a user-specified error, using as few samples from the simulated distribution as possible. In the case that the mean being estimated is small, one is then interested in controlling the relative error of the estimate. We introduce a new (epsilon, delta) relative error approximation scheme for [0,1] random variables and provide a comparison of this algorithm's performance to that of an existing approximation scheme, both …


Kinetic Monte Carlo Methods For Computing First Capture Time Distributions In Models Of Diffusive Absorption, Daniel Schmidt Jan 2017

Kinetic Monte Carlo Methods For Computing First Capture Time Distributions In Models Of Diffusive Absorption, Daniel Schmidt

HMC Senior Theses

In this paper, we consider the capture dynamics of a particle undergoing a random walk above a sheet of absorbing traps. In particular, we seek to characterize the distribution in time from when the particle is released to when it is absorbed. This problem is motivated by the study of lymphocytes in the human blood stream; for a particle near the surface of a lymphocyte, how long will it take for the particle to be captured? We model this problem as a diffusive process with a mixture of reflecting and absorbing boundary conditions. The model is analyzed from two approaches. …


The Document Similarity Network: A Novel Technique For Visualizing Relationships In Text Corpora, Dylan Baker Jan 2017

The Document Similarity Network: A Novel Technique For Visualizing Relationships In Text Corpora, Dylan Baker

HMC Senior Theses

With the abundance of written information available online, it is useful to be able to automatically synthesize and extract meaningful information from text corpora. We present a unique method for visualizing relationships between documents in a text corpus. By using Latent Dirichlet Allocation to extract topics from the corpus, we create a graph whose nodes represent individual documents and whose edge weights indicate the distance between topic distributions in documents. These edge lengths are then scaled using multidimensional scaling techniques, such that more similar documents are clustered together. Applying this method to several datasets, we demonstrate that these graphs are …


Machine Learning On Statistical Manifold, Bo Zhang Jan 2017

Machine Learning On Statistical Manifold, Bo Zhang

HMC Senior Theses

This senior thesis project explores and generalizes some fundamental machine learning algorithms from the Euclidean space to the statistical manifold, an abstract space in which each point is a probability distribution. In this thesis, we adapt the optimal separating hyperplane, the k-means clustering method, and the hierarchical clustering method for classifying and clustering probability distributions. In these modifications, we use the statistical distances as a measure of the dissimilarity between objects. We describe a situation where the clustering of probability distributions is needed and useful. We present many interesting and promising empirical clustering results, which demonstrate the statistical-distance-based clustering algorithms …