Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 69

Full-Text Articles in Physical Sciences and Mathematics

Exploration And Statistical Modeling Of Profit, Caleb Gibson Dec 2023

Exploration And Statistical Modeling Of Profit, Caleb Gibson

Undergraduate Honors Theses

For any company involved in sales, maximization of profit is the driving force that guides all decision-making. Many factors can influence how profitable a company can be, including external factors like changes in inflation or consumer demand or internal factors like pricing and product cost. Understanding specific trends in one's own internal data, a company can readily identify problem areas or potential growth opportunities to help increase profitability.

In this discussion, we use an extensive data set to examine how a company might analyze their own data to identify potential changes the company might investigate to drive better performance. Based …


Geometric Morphometric Analysis Of Modern Viperid Vertebrae Facilitates Identification Of Fossil Specimens, Lance D. Jessee Aug 2023

Geometric Morphometric Analysis Of Modern Viperid Vertebrae Facilitates Identification Of Fossil Specimens, Lance D. Jessee

Electronic Theses and Dissertations

Snake vertebrae are common in the fossil record, whereas cranial remains are generally fragile and rare. Consequently, vertebrae are the most commonly studied fossil element of snakes. However, identification of snake vertebrae can be problematic due to extensive variation. This study utilizes 2-D geometric morphometrics and canonical variates analysis to 1) reveal variation between genera and species and 2) classify vertebrae of modern and fossil eastern North American Agkistrodon and Crotalus. The results show that vertebrae of Agkistrodon and Crotalus can reliably be classified to genus and species using these methods. Based on the statistical analyses, four of the …


Predicting High-Cap Tech Stock Polarity: A Combined Approach Using Support Vector Machines And Bidirectional Encoders From Transformers, Ian L. Grisham May 2023

Predicting High-Cap Tech Stock Polarity: A Combined Approach Using Support Vector Machines And Bidirectional Encoders From Transformers, Ian L. Grisham

Electronic Theses and Dissertations

The abundance, accessibility, and scale of data have engendered an era where machine learning can quickly and accurately solve complex problems, identify complicated patterns, and uncover intricate trends. One research area where many have applied these techniques is the stock market. Yet, financial domains are influenced by many factors and are notoriously difficult to predict due to their volatile and multivariate behavior. However, the literature indicates that public sentiment data may exhibit significant predictive qualities and improve a model’s ability to predict intricate trends. In this study, momentum SVM classification accuracy was compared between datasets that did and did not …


Finding A Representative Distribution For The Tail Index Alpha, Α, For Stock Return Data From The New York Stock Exchange, Jett Burns May 2022

Finding A Representative Distribution For The Tail Index Alpha, Α, For Stock Return Data From The New York Stock Exchange, Jett Burns

Electronic Theses and Dissertations

Statistical inference is a tool for creating models that can accurately display real-world events. Special importance is given to the financial methods that model risk and large price movements. A parameter that describes tail heaviness, and risk overall, is α. This research finds a representative distribution that models α. The absolute value of standardized stock returns from the Center for Research on Security Prices are used in this research. The inference is performed using R. Approximations for α are found using the ptsuite package. The GAMLSS package employs maximum likelihood estimation to estimate distribution parameters using the CRSP data. The …


Intraday Algorithmic Trading Using Momentum And Long Short-Term Memory Network Strategies, Andrew R. Whitinger Ii May 2022

Intraday Algorithmic Trading Using Momentum And Long Short-Term Memory Network Strategies, Andrew R. Whitinger Ii

Undergraduate Honors Theses

Intraday stock trading is an infamously difficult and risky strategy. Momentum and reversal strategies and long short-term memory (LSTM) neural networks have been shown to be effective for selecting stocks to buy and sell over time periods of multiple days. To explore whether these strategies can be effective for intraday trading, their implementations were simulated using intraday price data for stocks in the S&P 500 index, collected at 1-second intervals between February 11, 2021 and March 9, 2021 inclusive. The study tested 160 variations of momentum and reversal strategies for profitability in long, short, and market-neutral portfolios, totaling 480 portfolios. …


Functional Mixed Data Clustering With Fourier Basis Smoothing, Ishmael Amartey Dec 2021

Functional Mixed Data Clustering With Fourier Basis Smoothing, Ishmael Amartey

Electronic Theses and Dissertations

Clustering is an important analytical technique that has proven to affect human life positively through its application in cancer research, market segmentation, city planning etc. In this time of growing technological systems, mixed data has seen another face of longitudinal, directional and functional attributes which is worth paying attention to and analyzing. Previous research works on clustering relied largely on the inverse weight technique and B-spline in smoothing data and assessing the performance of various clustering algorithms. In 1971, Gower proposed a method of clustering for mixed variable types which has been extended to include functional and directional variables by …


Performance Comparison Of Imputation Methods For Mixed Data Missing At Random With Small And Large Sample Data Set With Different Variability, Kyei Afari Aug 2021

Performance Comparison Of Imputation Methods For Mixed Data Missing At Random With Small And Large Sample Data Set With Different Variability, Kyei Afari

Electronic Theses and Dissertations

One of the concerns in the field of statistics is the presence of missing data, which leads to bias in parameter estimation and inaccurate results. However, the multiple imputation procedure is a remedy for handling missing data. This study looked at the best multiple imputation methods used to handle mixed variable datasets with different sample sizes and variability along with different levels of missingness. The study employed the predictive mean matching, classification and regression trees, and the random forest imputation methods. For each dataset, the multiple regression parameter estimates for the complete datasets were compared to the multiple regression parameter …


Applying Deep Learning To The Ice Cream Vendor Problem: An Extension Of The Newsvendor Problem, Gaffar Solihu Aug 2021

Applying Deep Learning To The Ice Cream Vendor Problem: An Extension Of The Newsvendor Problem, Gaffar Solihu

Electronic Theses and Dissertations

The Newsvendor problem is a classical supply chain problem used to develop strategies for inventory optimization. The goal of the newsvendor problem is to predict the optimal order quantity of a product to meet an uncertain demand in the future, given that the demand distribution itself is known. The Ice Cream Vendor Problem extends the classical newsvendor problem to an uncertain demand with unknown distribution, albeit a distribution that is known to depend on exogenous features. The goal is thus to estimate the order quantity that minimizes the total cost when demand does not follow any known statistical distribution. The …


Performance Comparison Of Multiple Imputation Methods For Quantitative Variables For Small And Large Data With Differing Variability, Vincent Onyame May 2021

Performance Comparison Of Multiple Imputation Methods For Quantitative Variables For Small And Large Data With Differing Variability, Vincent Onyame

Electronic Theses and Dissertations

Missing data continues to be one of the main problems in data analysis as it reduces sample representativeness and consequently, causes biased estimates. Multiple imputation methods have been established as an effective method of handling missing data. In this study, we examined multiple imputation methods for quantitative variables on twelve data sets with varied sizes and variability that were pseudo generated from an original data. The multiple imputation methods examined are the predictive mean matching, Bayesian linear regression and linear regression, non-Bayesian in the MICE (Multiple Imputation Chain Equation) package in the statistical software, R. The parameter estimates generated from …


Zeta Function Regularization And Its Relationship To Number Theory, Stephen Wang May 2021

Zeta Function Regularization And Its Relationship To Number Theory, Stephen Wang

Electronic Theses and Dissertations

While the "path integral" formulation of quantum mechanics is both highly intuitive and far reaching, the path integrals themselves often fail to converge in the usual sense. Richard Feynman developed regularization as a solution, such that regularized path integrals could be calculated and analyzed within a strictly physics context. Over the past 50 years, mathematicians and physicists have retroactively introduced schemes for achieving mathematical rigor in the study and application of regularized path integrals. One such scheme was introduced in 2007 by the mathematicians Klaus Kirsten and Paul Loya. In this thesis, we reproduce the Kirsten and Loya approach to …


Communications And Methodologies In Crime Geography: Contemporary Approaches To Disseminating Criminal Incidence And Research, Mitchell Ogden Dec 2019

Communications And Methodologies In Crime Geography: Contemporary Approaches To Disseminating Criminal Incidence And Research, Mitchell Ogden

Electronic Theses and Dissertations

Many tools exist to assist law enforcement agencies in mitigating criminal activity. For centuries, academics used statistics in the study of crime and criminals, and more recently, police departments make use of spatial statistics and geographic information systems in that pursuit. Clustering and hot spot methods of analysis are popular in this application for their relative simplicity of interpretation and ease of process. With recent advancements in geospatial technology, it is easier than ever to publicly share data through visual communication tools like web applications and dashboards. Sharing data and results of analyses boosts transparency and the public image of …


Function Space Tensor Decomposition And Its Application In Sports Analytics, Justin Reising Dec 2019

Function Space Tensor Decomposition And Its Application In Sports Analytics, Justin Reising

Electronic Theses and Dissertations

Recent advancements in sports information and technology systems have ushered in a new age of applications of both supervised and unsupervised analytical techniques in the sports domain. These automated systems capture large volumes of data points about competitors during live competition. As a result, multi-relational analyses are gaining popularity in the field of Sports Analytics. We review two case studies of dimensionality reduction with Principal Component Analysis and latent factor analysis with Non-Negative Matrix Factorization applied in sports. Also, we provide a review of a framework for extending these techniques for higher order data structures. The primary scope of this …


Is Corequisite Developmental Math Effective At East Tennessee State University?, Christine Padden Aug 2019

Is Corequisite Developmental Math Effective At East Tennessee State University?, Christine Padden

Electronic Theses and Dissertations

This thesis looks at the corequisite developmental math program at East Tennessee State University (ETSU) and compares the effectiveness to the previous developmental math program by comparing the student outcomes in MATH 1530. MATH 1530 is a non-calculus based statistic and probability course that satisfies most majors’ general education math requirements. ETSU sees approximately 1,000 students a year pass through MATH 1530 which is around 6.7% of the total enrollment at ETSU[9]. We are interested in the last five years of the developmental math program before it was changed to corequisite developmental math and the first five years of corequisite …


Robustness Of Semi-Parametric Survival Model: Simulation Studies And Application To Clinical Data, Isaac Nwi-Mozu Aug 2019

Robustness Of Semi-Parametric Survival Model: Simulation Studies And Application To Clinical Data, Isaac Nwi-Mozu

Electronic Theses and Dissertations

An efficient way of analyzing survival clinical data such as cancer data is a great concern to health experts. In this study, we investigate and propose an efficient way of handling survival clinical data. Simulation studies were conducted to compare performances of various forms of survival model techniques using an R package ``survsim". Models performance was conducted with varying sample sizes as small ($n5000$). For small and mild samples, the performance of the semi-parametric outperform or approximate the performance of the parametric model. However, for large samples, the parametric model outperforms the semi-parametric model. We compared the effectiveness and reliability …


Comparison Of Imputation Methods For Mixed Data Missing At Random, Kaitlyn Heidt May 2019

Comparison Of Imputation Methods For Mixed Data Missing At Random, Kaitlyn Heidt

Electronic Theses and Dissertations

A statistician's job is to produce statistical models. When these models are precise and unbiased, we can relate them to new data appropriately. However, when data sets have missing values, assumptions to statistical methods are violated and produce biased results. The statistician's objective is to implement methods that produce unbiased and accurate results. Research in missing data is becoming popular as modern methods that produce unbiased and accurate results are emerging, such as MICE in R, a statistical software. Using real data, we compare four common imputation methods, in the MICE package in R, at different levels of missingness. The …


A Comparison Of Standard Denoising Methods For Peptide Identification, Skylar Carpenter May 2019

A Comparison Of Standard Denoising Methods For Peptide Identification, Skylar Carpenter

Electronic Theses and Dissertations

Peptide identification using tandem mass spectrometry depends on matching the observed spectrum with the theoretical spectrum. The raw data from tandem mass spectrometry, however, is often not optimal because it may contain noise or measurement errors. Denoising this data can improve alignment between observed and theoretical spectra and reduce the number of peaks. The method used by Lewis et. al (2018) uses a combined constant and moving threshold to denoise spectra. We compare the effects of using the standard preprocessing methods baseline removal, wavelet smoothing, and binning on spectra with Lewis et. al’s threshold method. We consider individual methods and …


Generalizations Of The Arcsine Distribution, Rebecca Rasnick May 2019

Generalizations Of The Arcsine Distribution, Rebecca Rasnick

Electronic Theses and Dissertations

The arcsine distribution looks at the fraction of time one player is winning in a fair coin toss game and has been studied for over a hundred years. There has been little further work on how the distribution changes when the coin tosses are not fair or when a player has already won the initial coin tosses or, equivalently, starts with a lead. This thesis will first cover a proof of the arcsine distribution. Then, we explore how the distribution changes when the coin the is unfair. Finally, we will explore the distribution when one person has won the first …


A Systematic Assessment Of Socio-Economic Impacts Of Prolonged Episodic Volcano Crises, Justin Peers May 2019

A Systematic Assessment Of Socio-Economic Impacts Of Prolonged Episodic Volcano Crises, Justin Peers

Electronic Theses and Dissertations

Uncertainty surrounding volcanic activity can lead to socio-economic crises with or without an eruption as demonstrated by the post-1978 response to unrest of Long Valley Caldera (LVC), CA. Extensive research in physical sciences provides a foundation on which to assess direct impacts of hazards, but fewer resources have been dedicated towards understanding human responses to volcanic risk. To evaluate natural hazard risk issues at LVC, a multi-hazard, mail-based, household survey was conducted to compare perceptions of volcanic, seismic, and wildfire hazards. Impacts of volcanic activity on housing prices and businesses were examined at the county-level for three volcanoes with a …


Clustering Mixed Data: An Extension Of The Gower Coefficient With Weighted L2 Distance, Augustine Oppong Aug 2018

Clustering Mixed Data: An Extension Of The Gower Coefficient With Weighted L2 Distance, Augustine Oppong

Electronic Theses and Dissertations

Sorting out data into partitions is increasing becoming complex as the constituents of data is growing outward everyday. Mixed data comprises continuous, categorical, directional functional and other types of variables. Clustering mixed data is based on special dissimilarities of the variables. Some data types may influence the clustering solution. Assigning appropriate weight to the functional data may improve the performance of the clustering algorithm. In this paper we use the extension of the Gower coefficient with judciously chosen weight for the L2 to cluster mixed data.The benefits of weighting are demonstrated both in in applications to the Buoy data set …


The Expected Number Of Patterns In A Random Generated Permutation On [N] = {1,2,...,N}, Evelyn Fokuoh Aug 2018

The Expected Number Of Patterns In A Random Generated Permutation On [N] = {1,2,...,N}, Evelyn Fokuoh

Electronic Theses and Dissertations

Previous work by Flaxman (2004) and Biers-Ariel et al. (2018) focused on the number of distinct words embedded in a string of words of length n. In this thesis, we will extend this work to permutations, focusing on the maximum number of distinct permutations contained in a permutation on [n] = {1,2,...,n} and on the expected number of distinct permutations contained in a random permutation on [n]. We further considered the problem where repetition of subsequences are as a result of the occurrence of (Type A and/or Type B) replications. Our method of enumerating the Type A replications causes double …


Distribution Of A Sum Of Random Variables When The Sample Size Is A Poisson Distribution, Mark Pfister Aug 2018

Distribution Of A Sum Of Random Variables When The Sample Size Is A Poisson Distribution, Mark Pfister

Electronic Theses and Dissertations

A probability distribution is a statistical function that describes the probability of possible outcomes in an experiment or occurrence. There are many different probability distributions that give the probability of an event happening, given some sample size n. An important question in statistics is to determine the distribution of the sum of independent random variables when the sample size n is fixed. For example, it is known that the sum of n independent Bernoulli random variables with success probability p is a Binomial distribution with parameters n and p: However, this is not true when the sample size …


Geostatistical Analysis Of Potential Sinkhole Risk: Examining Spatial And Temporal Climate Relationships In Tennessee And Florida, Kimberly Blazzard May 2018

Geostatistical Analysis Of Potential Sinkhole Risk: Examining Spatial And Temporal Climate Relationships In Tennessee And Florida, Kimberly Blazzard

Electronic Theses and Dissertations

Sinkholes are a significant hazard for the southeastern United States. Although differences in climate are known to affect karst environments differently, quantitative analyses correlating sinkhole formation with climate variables is lacking. A temporal linear regression for Florida sinkholes and two modeled regressions for Tennessee sinkholes were produced: a general linearized logistic regression and a MaxEnt derived species distribution model. Temporal results showed highly significant correlations with precipitation, teleconnection patterns, temperature, and CO2, while spatial results showed highly significant correlations with precipitation, wind speed, solar radiation, and maximum temperature. Regression results indicated that some sinkhole formation variability could be …


Peptide Identification: Refining A Bayesian Stochastic Model, Theophilus Barnabas Kobina Acquah May 2017

Peptide Identification: Refining A Bayesian Stochastic Model, Theophilus Barnabas Kobina Acquah

Electronic Theses and Dissertations

Notwithstanding the challenges associated with different methods of peptide identification, other methods have been explored over the years. The complexity, size and computational challenges of peptide-based data sets calls for more intrusion into this sphere. By relying on the prior information about the average relative abundances of bond cleavages and the prior probability of any specific amino acid sequence, we refine an already developed Bayesian approach in identifying peptides. The likelihood function is improved by adding additional ions to the model and its size is driven by two overall goodness of fit measures. In the face of the complexities associated …


Denoising Tandem Mass Spectrometry Data, Felix Offei May 2017

Denoising Tandem Mass Spectrometry Data, Felix Offei

Electronic Theses and Dissertations

Protein identification using tandem mass spectrometry (MS/MS) has proven to be an effective way to identify proteins in a biological sample. An observed spectrum is constructed from the data produced by the tandem mass spectrometer. A protein can be identified if the observed spectrum aligns with the theoretical spectrum. However, data generated by the tandem mass spectrometer are affected by errors thus making protein identification challenging in the field of proteomics. Some of these errors include wrong calibration of the instrument, instrument distortion and noise. In this thesis, we present a pre-processing method, which focuses on the removal of noisy …


A Distribution Of The First Order Statistic When The Sample Size Is Random, Vincent Z. Forgo Mr May 2017

A Distribution Of The First Order Statistic When The Sample Size Is Random, Vincent Z. Forgo Mr

Electronic Theses and Dissertations

Statistical distributions also known as probability distributions are used to model a random experiment. Probability distributions consist of probability density functions (pdf) and cumulative density functions (cdf). Probability distributions are widely used in the area of engineering, actuarial science, computer science, biological science, physics, and other applicable areas of study. Statistics are used to draw conclusions about the population through probability models. Sample statistics such as the minimum, first quartile, median, third quartile, and maximum, referred to as the five-number summary, are examples of order statistics. The minimum and maximum observations are important in extreme value theory. This paper will …


Spatiotemporal Analyses Of Recycled Water Production, Jana E. Archer May 2017

Spatiotemporal Analyses Of Recycled Water Production, Jana E. Archer

Electronic Theses and Dissertations

Increased demands on water supplies caused by population expansion, saltwater intrusion, and drought have led to water shortages which may be addressed by use of recycled water as recycled water products. Study I investigated recycled water production in Florida and California during 2009 to detect gaps in distribution and identify areas for expansion. Gaps were detected along the panhandle and Miami, Florida, as well as the northern and southwestern regions in California. Study II examined gaps in distribution, identified temporal change, and located areas for expansion for Florida in 2009 and 2015. Production increased in the northern and southern regions …


Performance Of Imputation Algorithms On Artificially Produced Missing At Random Data, Tobias O. Oketch May 2017

Performance Of Imputation Algorithms On Artificially Produced Missing At Random Data, Tobias O. Oketch

Electronic Theses and Dissertations

Missing data is one of the challenges we are facing today in modeling valid statistical models. It reduces the representativeness of the data samples. Hence, population estimates, and model parameters estimated from such data are likely to be biased.

However, the missing data problem is an area under study, and alternative better statistical procedures have been presented to mitigate its shortcomings. In this paper, we review causes of missing data, and various methods of handling missing data. Our main focus is evaluating various multiple imputation (MI) methods from the multiple imputation of chained equation (MICE) package in the statistical software …


A Multi-Indexed Logistic Model For Time Series, Xiang Liu Dec 2016

A Multi-Indexed Logistic Model For Time Series, Xiang Liu

Electronic Theses and Dissertations

In this thesis, we explore a multi-indexed logistic regression (MILR) model, with particular emphasis given to its application to time series. MILR includes simple logistic regression (SLR) as a special case, and the hope is that it will in some instances also produce significantly better results. To motivate the development of MILR, we consider its application to the analysis of both simulated sine wave data and stock data. We looked at well-studied SLR and its application in the analysis of time series data. Using a more sophisticated representation of sequential data, we then detail the implementation of MILR. We compare …


Tornado Density And Return Periods In The Southeastern United States: Communicating Risk And Vulnerability At The Regional And State Levels, Michelle Bradburn Aug 2016

Tornado Density And Return Periods In The Southeastern United States: Communicating Risk And Vulnerability At The Regional And State Levels, Michelle Bradburn

Electronic Theses and Dissertations

Tornado intensity and impacts vary drastically across space, thus spatial and statistical analyses were used to identify patterns of tornado severity in the Southeastern United States and to assess the vulnerability and estimated recurrence of tornadic activity. Records from the Storm Prediction Center's tornado database (1950-2014) were used to estimate kernel density to identify areas of high and low tornado frequency at both the regional- and state-scales. Return periods (2-year, 5-year, 10-year, 25-year, 50-year, and 100-year) were calculated at both scales as well using a composite score that included EF-scale magnitude, injury counts, and fatality counts. Results showed that the …


Newsvendor Models With Monte Carlo Sampling, Ijeoma W. Ekwegh Aug 2016

Newsvendor Models With Monte Carlo Sampling, Ijeoma W. Ekwegh

Electronic Theses and Dissertations

Newsvendor Models with Monte Carlo Sampling by Ijeoma Winifred Ekwegh The newsvendor model is used in solving inventory problems in which demand is random. In this thesis, we will focus on a method of using Monte Carlo sampling to estimate the order quantity that will either maximizes revenue or minimizes cost given that demand is uncertain. Given data, the Monte Carlo approach will be used in sampling data over scenarios and also estimating the probability density function. A bootstrapping process yields an empirical distribution for the order quantity that will maximize the expected profit. Finally, this method will be used …