Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 21 of 21

Full-Text Articles in Physical Sciences and Mathematics

Application Of Sentiment Analysis And Machine Learning Techniques To Predict Daily Cryptocurrency Price Returns, Edward Wu Jan 2023

Application Of Sentiment Analysis And Machine Learning Techniques To Predict Daily Cryptocurrency Price Returns, Edward Wu

CMC Senior Theses

This paper examines the effects of social media sentiment relating to Bitcoin on the daily price returns of Bitcoin and other popular cryptocurrencies by utilizing sentiment analysis and machine learning techniques to predict daily price returns. Many investors think that social media sentiment affects cryptocurrency prices. However, the results of this paper find that social media sentiment relating to Bitcoin does not add significant predictive value to forecasting daily price returns for each of the six cryptocurrencies used for analysis and that machine learning models that do not assume linearity between the current day price return and previous daily price …


Containing Compounding Container Congestion, Curtis Salinger Jan 2022

Containing Compounding Container Congestion, Curtis Salinger

CMC Senior Theses

The Covid-19 pandemic caused major disruptions throughout the container shipping supply chain. Professor Dongping Song of Liverpool University wrote a paper discussing the logistical vulnerabilities in the supply chain, including the issue of congestion in ports. This paper examines the Port of Los Angeles from 2018-2021 as it relates to Song’s paper to see how its operations were impacted during the Covid-19 timeframe. It is found that labor shortages, chassis shortages, and change in trade behavior each contributed to the congestion. Unfortunately, the implemented policies were insufficient to bolster the port against sustained challenges and congestion continues to worsen.


Using Short Bursts To Optimize Redistricting In Georgia, Vedika Vishweshwar Jan 2022

Using Short Bursts To Optimize Redistricting In Georgia, Vedika Vishweshwar

CMC Senior Theses

Identifying extreme outliers in large state spaces is a difficult prob-
lem. I consider this problem in the context of finding political district-
ing plans that maximize the number of districts in which the majority
of the population is from a minority group, such as African Americans.
Since the set of all possible districting plans is enormous and unfeasi-
ble to examine in practice, this paper proposes a sampling method to
find these outlying plans. Specifically, this paper experiments with short
bursts in the context of minority voting rights in Georgia. Short bursts
are a type of Markov Chain in …


Feature Investigation For Stock Returns Prediction Using Xgboost And Deep Learning Sentiment Classification, Seungho (Samuel) Lee Jan 2021

Feature Investigation For Stock Returns Prediction Using Xgboost And Deep Learning Sentiment Classification, Seungho (Samuel) Lee

CMC Senior Theses

This paper attempts to quantify predictive power of social media sentiment and financial data in stock prediction by utilizing a comprehensive set of stock-related fundamental and technical variables and social media sentiments. For conducting sentiment analysis, this study employs a pretrained finBERT model that provides three different sentiment classifications and respective softmax scores. Hence, the significance of these variables is evaluated with XGBoost regression and Shapley Additive exPlanations (SHAP) frameworks. Through investigating feature importance, this study finds that statistical properties of sentiment variables provide a stronger predictive power than a weighted sentiment score and that it is possible to quantify …


An Evaluation Of Knot Placement Strategies For Spline Regression, William Klein Jan 2021

An Evaluation Of Knot Placement Strategies For Spline Regression, William Klein

CMC Senior Theses

Regression splines have an established value for producing quality fit at a relatively low-degree polynomial. This paper explores the implications of adopting new methods for knot selection in tandem with established methodology from the current literature. Structural features of generated datasets, as well as residuals collected from sequential iterative models are used to augment the equidistant knot selection process. From analyzing a simulated dataset and an application onto the Racial Animus dataset, I find that a B-spline basis paired with equally-spaced knots remains the best choice when data are evenly distributed, even when structural features of a dataset are known …


Using Twitter Api To Solve The Goat Debate: Michael Jordan Vs. Lebron James, Jordan Trey Leonard Jan 2021

Using Twitter Api To Solve The Goat Debate: Michael Jordan Vs. Lebron James, Jordan Trey Leonard

CMC Senior Theses

Using a Twitter API, I gather and analyze tweets by performing sentiment analysis to solve the GOAT debate among professional athletes with the primary focus on comparing Michael Jordan and LeBron James. Athletes from the National Football League (NFL), the National Basketball Association (NBA), Major League Baseball (MLB), and the National Collegiate Athletic Association (NCAA) Division 1 Men's and Women's Basketball were selected to compare how sentiment polarity varies across sports. Sentiment polarity is measured by labeling text as "positive", "neutral", or "negative" which allows us to determine which athlete/sport is highly favored among the Twitter community when it comes …


Information Prioritization: A Comparison Between Utility Maximizers And Probability Matchers, Yusuf Ismaeel Jan 2021

Information Prioritization: A Comparison Between Utility Maximizers And Probability Matchers, Yusuf Ismaeel

CMC Senior Theses

This thesis examines the differences between probability matchers and utility maximizers in their preferences for information sources in a lab environment. In this paper, we consider the best source of information to be the most connected one. We conducted several linear probability model type regressions along with logit regressions. Furthermore, we also attempted to control and fix any potential misclassifications in classifying the cognitive strategy by using instrumental variables. The results show that utility maximizers will almost always choose the most informed node. Probability matchers, on the other hand, do not exhibit such a behavior as the probability matching strategy …


K-Means Stock Clustering Analysis Based On Historical Price Movements And Financial Ratios, Shu Bin Jan 2020

K-Means Stock Clustering Analysis Based On Historical Price Movements And Financial Ratios, Shu Bin

CMC Senior Theses

The 2015 article Creating Diversified Portfolios Using Cluster Analysis proposes an algorithm that uses the Sharpe ratio and results from K-means clustering conducted on companies' historical financial ratios to generate stock market portfolios. This project seeks to evaluate the performance of the portfolio-building algorithm during the beginning period of the COVID-19 recession. S&P 500 companies' historical stock price movement and their historical return on assets and asset turnover ratios are used as dissimilarity metrics for K-means clustering. After clustering, stock with the highest Sharpe ratio from each cluster is picked to become a part of the portfolio. The economic and …


How Machine Learning And Probability Concepts Can Improve Nba Player Evaluation, Harrison Miller Jan 2020

How Machine Learning And Probability Concepts Can Improve Nba Player Evaluation, Harrison Miller

CMC Senior Theses

In this paper I will be breaking down a scholarly article, written by Sameer K. Deshpande and Shane T. Jensen, that proposed a new method to evaluate NBA players. The NBA is the highest level professional basketball league in America and stands for the National Basketball Association. They proposed to build a model that would result in how NBA players impact their teams chances of winning a game, using machine learning and probability concepts. I preface that by diving into these concepts and their mathematical backgrounds. These concepts include building a linear model using ordinary least squares method, the bias …


Snap Scholar: The User Experience Of Engaging With Academic Research Through A Tappable Stories Medium, Ieva Burk Jan 2019

Snap Scholar: The User Experience Of Engaging With Academic Research Through A Tappable Stories Medium, Ieva Burk

CMC Senior Theses

With the shift to learn and consume information through our mobile devices, most academic research is still only presented in long-form text. The Stanford Scholar Initiative has explored the segment of content creation and consumption of academic research through video. However, there has been another popular shift in presenting information from various social media platforms and media outlets in the past few years. Snapchat and Instagram have introduced the concept of tappable “Stories” that have gained popularity in the realm of content consumption.

To accelerate the growth of the creation of these research talks, I propose an alternative to video: …


Bayesian Hierarchical Meta-Analysis Of Asymptomatic Ebola Seroprevalence, Peter Brody-Moore Jan 2019

Bayesian Hierarchical Meta-Analysis Of Asymptomatic Ebola Seroprevalence, Peter Brody-Moore

CMC Senior Theses

The continued study of asymptomatic Ebolavirus infection is necessary to develop a more complete understanding of Ebola transmission dynamics. This paper conducts a meta-analysis of eight studies that measure seroprevalence (the number of subjects that test positive for anti-Ebolavirus antibodies in their blood) in subjects with household exposure or known case-contact with Ebola, but that have shown no symptoms. In our two random effects Bayesian hierarchical models, we find estimated seroprevalences of 8.76% and 9.72%, significantly higher than the 3.3% found by a previous meta-analysis of these eight studies. We also produce a variation of this meta-analysis where we exclude …


Step-Selection Functions For Modeling Animal Movement -- Case Study: African Buffalo, Maia Adar Jan 2018

Step-Selection Functions For Modeling Animal Movement -- Case Study: African Buffalo, Maia Adar

CMC Senior Theses

Understanding what factors influence wildlife movement allows landscape planners to make informed decisions that benefit both animals and humans. New quantitative methods, such as step-selection functions, provide valuable objective analyses of wildlife connectivity. This paper provides a framework for creating a step-selection function and demonstrates its use in a case study. The first section provides a general introduction about wildlife connectivity research. The second section explains the math behind the step-selection function using a simple example. The last section gives the results of a step-selection model for African buffalo in the Kavango Zambezi Transfrontier Conservation Area. Buffalo were found to …


Predictive Golf Analytics Versus The Daily Fantasy Sports Market, John O'Malley Jan 2018

Predictive Golf Analytics Versus The Daily Fantasy Sports Market, John O'Malley

CMC Senior Theses

This study examines the different skills necessary for PGA tour players to succeed at specific annual tournaments, in order to create a predictive model for DraftKings PGA contests. The model takes into account data from the PGA Tour ShotLink Intelligence Program. The predictive model is created each week based on past results from the specific tournament in question, with the hope of predicting a group of twenty-five players who should be successful based on their statistical profile. The results of the model are detailed in this paper, which covers the first nine weeks of the 2017 PGA Tour season, with …


A New Approximation Scheme For Monte Carlo Applications, Bo Jones Jan 2017

A New Approximation Scheme For Monte Carlo Applications, Bo Jones

CMC Senior Theses

Approximation algorithms employing Monte Carlo methods, across application domains, often require as a subroutine the estimation of the mean of a random variable with support on [0,1]. One wishes to estimate this mean to within a user-specified error, using as few samples from the simulated distribution as possible. In the case that the mean being estimated is small, one is then interested in controlling the relative error of the estimate. We introduce a new (epsilon, delta) relative error approximation scheme for [0,1] random variables and provide a comparison of this algorithm's performance to that of an existing approximation scheme, both …


Applications Of Monte Carlo Methods In Statistical Inference Using Regression Analysis, Ji Young Huh Jan 2015

Applications Of Monte Carlo Methods In Statistical Inference Using Regression Analysis, Ji Young Huh

CMC Senior Theses

This paper studies the use of Monte Carlo simulation techniques in the field of econometrics, specifically statistical inference. First, I examine several estimators by deriving properties explicitly and generate their distributions through simulations. Here, simulations are used to illustrate and support the analytical results. Then, I look at test statistics where derivations are costly because of the sensitivity of their critical values to the data generating processes. Simulations here establish significance and necessity for drawing statistical inference. Overall, the paper examines when and how simulations are needed in studying econometric theories.


Acceptance-Rejection Sampling With Hierarchical Models, Christian A. Ayala Jan 2015

Acceptance-Rejection Sampling With Hierarchical Models, Christian A. Ayala

CMC Senior Theses

Hierarchical models provide a flexible way of modeling complex behavior. However, the complicated interdependencies among the parameters in the hierarchy make training such models difficult. MCMC methods have been widely used for this purpose, but can often only approximate the necessary distributions. Acceptance-rejection sampling allows for perfect simulation from these often unnormalized distributions by drawing from another distribution over the same support. The efficacy of acceptance-rejection sampling is explored through application to a small dataset which has been widely used for evaluating different methods for inference on hierarchical models. A particular algorithm is developed to draw variates from the posterior …


Scalable Collaborative Filtering Recommendation Algorithms On Apache Spark, Walker Evan Casey Jan 2014

Scalable Collaborative Filtering Recommendation Algorithms On Apache Spark, Walker Evan Casey

CMC Senior Theses

Collaborative filtering based recommender systems use information about a user's preferences to make personalized predictions about content, such as topics, people, or products, that they might find relevant. As the volume of accessible information and active users on the Internet continues to grow, it becomes increasingly difficult to compute recommendations quickly and accurately over a large dataset. In this study, we will introduce an algorithmic framework built on top of Apache Spark for parallel computation of the neighborhood-based collaborative filtering problem, which allows the algorithm to scale linearly with a growing number of users. We also investigate several different variants …


State Level Earned Income Tax Credit’S Effects On Race And Age: An Effective Poverty Reduction Policy, Anthony J. Barone Jan 2013

State Level Earned Income Tax Credit’S Effects On Race And Age: An Effective Poverty Reduction Policy, Anthony J. Barone

CMC Senior Theses

In this paper, I analyze the effectiveness of state level Earned Income Tax Credit programs on improving of poverty levels. I conducted this analysis for the years 1991 through 2011 using a panel data model with fixed effects. The main independent variables of interest were the state and federal EITC rates, minimum wage, gross state product, population, and unemployment all by state. I determined increases to the state EITC rates provided only a slight decrease to both the overall white below-poverty population and the corresponding white childhood population under 18, while both the overall and the under-18 black population for …


Nfl Betting Market: Using Adjusted Statistics To Test Market Efficiency And Build A Betting Model, James P. Donnelly Jan 2013

Nfl Betting Market: Using Adjusted Statistics To Test Market Efficiency And Build A Betting Model, James P. Donnelly

CMC Senior Theses

The use of statistical analysis has been prevalent in the sports gambling industry for years. More recently, we have seen the emergence of "adjusted statistics", a more sophisticated way to examine each play and each result (further explanation below). And while adjusted statistics have become commonplace for professional and recreational bettors alike, little research has been done to justify their use. In this paper the effectiveness of this data is tested on the most heavily wagered sport in the world – the National Football League (NFL). The results are studied with two central questions in mind: Does the market account …


How Other Drivers’ Vehicle Characteristics Influence Your Driving Speed, Russell Brockett Jan 2011

How Other Drivers’ Vehicle Characteristics Influence Your Driving Speed, Russell Brockett

CMC Senior Theses

An analysis of the effect of passing vehicles’ characteristics and their impact on other drivers’ velocities was investigated. Three experimental studies were proposed and likely outcomes were discussed. Experiment 1 focused on the effect of passing vehicle type (SUV, sedan or truck) on driver speed. Drivers were hypothesized as going faster when the same vehicle type as they were driving passed them versus when no vehicle or a different vehicle passed them. Experiment 2 focused on the effect of passing SUV age on driver’s speed. Evidence suggests passing older SUVs will increase the driver’s speed more than new SUVs. Experiment …


Applying Localized Realized Volatility Modeling To Futures Indices, Luella Fu Jan 2011

Applying Localized Realized Volatility Modeling To Futures Indices, Luella Fu

CMC Senior Theses

This thesis extends the application of the localized realized volatility model created by Ying Chen, Wolfgang Karl Härdle, and Uta Pigorsch to other futures markets, particularly the CAC 40 and the NI 225. The research attempted to replicate results though ultimately, those results were invalidated by procedural difficulties.