Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 20 of 20

Full-Text Articles in Physical Sciences and Mathematics

Exploration And Statistical Modeling Of Profit, Caleb Gibson Dec 2023

Exploration And Statistical Modeling Of Profit, Caleb Gibson

Undergraduate Honors Theses

For any company involved in sales, maximization of profit is the driving force that guides all decision-making. Many factors can influence how profitable a company can be, including external factors like changes in inflation or consumer demand or internal factors like pricing and product cost. Understanding specific trends in one's own internal data, a company can readily identify problem areas or potential growth opportunities to help increase profitability.

In this discussion, we use an extensive data set to examine how a company might analyze their own data to identify potential changes the company might investigate to drive better performance. Based …


Finding A Representative Distribution For The Tail Index Alpha, Α, For Stock Return Data From The New York Stock Exchange, Jett Burns May 2022

Finding A Representative Distribution For The Tail Index Alpha, Α, For Stock Return Data From The New York Stock Exchange, Jett Burns

Electronic Theses and Dissertations

Statistical inference is a tool for creating models that can accurately display real-world events. Special importance is given to the financial methods that model risk and large price movements. A parameter that describes tail heaviness, and risk overall, is α. This research finds a representative distribution that models α. The absolute value of standardized stock returns from the Center for Research on Security Prices are used in this research. The inference is performed using R. Approximations for α are found using the ptsuite package. The GAMLSS package employs maximum likelihood estimation to estimate distribution parameters using the CRSP data. The …


Intraday Algorithmic Trading Using Momentum And Long Short-Term Memory Network Strategies, Andrew R. Whitinger Ii May 2022

Intraday Algorithmic Trading Using Momentum And Long Short-Term Memory Network Strategies, Andrew R. Whitinger Ii

Undergraduate Honors Theses

Intraday stock trading is an infamously difficult and risky strategy. Momentum and reversal strategies and long short-term memory (LSTM) neural networks have been shown to be effective for selecting stocks to buy and sell over time periods of multiple days. To explore whether these strategies can be effective for intraday trading, their implementations were simulated using intraday price data for stocks in the S&P 500 index, collected at 1-second intervals between February 11, 2021 and March 9, 2021 inclusive. The study tested 160 variations of momentum and reversal strategies for profitability in long, short, and market-neutral portfolios, totaling 480 portfolios. …


Performance Comparison Of Imputation Methods For Mixed Data Missing At Random With Small And Large Sample Data Set With Different Variability, Kyei Afari Aug 2021

Performance Comparison Of Imputation Methods For Mixed Data Missing At Random With Small And Large Sample Data Set With Different Variability, Kyei Afari

Electronic Theses and Dissertations

One of the concerns in the field of statistics is the presence of missing data, which leads to bias in parameter estimation and inaccurate results. However, the multiple imputation procedure is a remedy for handling missing data. This study looked at the best multiple imputation methods used to handle mixed variable datasets with different sample sizes and variability along with different levels of missingness. The study employed the predictive mean matching, classification and regression trees, and the random forest imputation methods. For each dataset, the multiple regression parameter estimates for the complete datasets were compared to the multiple regression parameter …


Function Space Tensor Decomposition And Its Application In Sports Analytics, Justin Reising Dec 2019

Function Space Tensor Decomposition And Its Application In Sports Analytics, Justin Reising

Electronic Theses and Dissertations

Recent advancements in sports information and technology systems have ushered in a new age of applications of both supervised and unsupervised analytical techniques in the sports domain. These automated systems capture large volumes of data points about competitors during live competition. As a result, multi-relational analyses are gaining popularity in the field of Sports Analytics. We review two case studies of dimensionality reduction with Principal Component Analysis and latent factor analysis with Non-Negative Matrix Factorization applied in sports. Also, we provide a review of a framework for extending these techniques for higher order data structures. The primary scope of this …


Communications And Methodologies In Crime Geography: Contemporary Approaches To Disseminating Criminal Incidence And Research, Mitchell Ogden Dec 2019

Communications And Methodologies In Crime Geography: Contemporary Approaches To Disseminating Criminal Incidence And Research, Mitchell Ogden

Electronic Theses and Dissertations

Many tools exist to assist law enforcement agencies in mitigating criminal activity. For centuries, academics used statistics in the study of crime and criminals, and more recently, police departments make use of spatial statistics and geographic information systems in that pursuit. Clustering and hot spot methods of analysis are popular in this application for their relative simplicity of interpretation and ease of process. With recent advancements in geospatial technology, it is easier than ever to publicly share data through visual communication tools like web applications and dashboards. Sharing data and results of analyses boosts transparency and the public image of …


A Comparison Of Standard Denoising Methods For Peptide Identification, Skylar Carpenter May 2019

A Comparison Of Standard Denoising Methods For Peptide Identification, Skylar Carpenter

Electronic Theses and Dissertations

Peptide identification using tandem mass spectrometry depends on matching the observed spectrum with the theoretical spectrum. The raw data from tandem mass spectrometry, however, is often not optimal because it may contain noise or measurement errors. Denoising this data can improve alignment between observed and theoretical spectra and reduce the number of peaks. The method used by Lewis et. al (2018) uses a combined constant and moving threshold to denoise spectra. We compare the effects of using the standard preprocessing methods baseline removal, wavelet smoothing, and binning on spectra with Lewis et. al’s threshold method. We consider individual methods and …


Clustering Mixed Data: An Extension Of The Gower Coefficient With Weighted L2 Distance, Augustine Oppong Aug 2018

Clustering Mixed Data: An Extension Of The Gower Coefficient With Weighted L2 Distance, Augustine Oppong

Electronic Theses and Dissertations

Sorting out data into partitions is increasing becoming complex as the constituents of data is growing outward everyday. Mixed data comprises continuous, categorical, directional functional and other types of variables. Clustering mixed data is based on special dissimilarities of the variables. Some data types may influence the clustering solution. Assigning appropriate weight to the functional data may improve the performance of the clustering algorithm. In this paper we use the extension of the Gower coefficient with judciously chosen weight for the L2 to cluster mixed data.The benefits of weighting are demonstrated both in in applications to the Buoy data set …


A Comparison Of Unsupervised Methods For Dna Microarray Leukemia Data, Denise Harness Apr 2018

A Comparison Of Unsupervised Methods For Dna Microarray Leukemia Data, Denise Harness

Appalachian Student Research Forum

Advancements in DNA microarray data sequencing have created the need for sophisticated machine learning algorithms and feature selection methods. Probabilistic graphical models, in particular, have been used to identify whether microarrays or genes cluster together in groups of individuals having a similar diagnosis. These clusters of genes are informative, but can be misleading when every gene is used in the calculation. First feature reduction techniques are explored, however the size and nature of the data prevents traditional techniques from working efficiently. Our method is to use the partial correlations between the features to create a precision matrix and predict which …


Performance Of Imputation Algorithms On Artificially Produced Missing At Random Data, Tobias O. Oketch May 2017

Performance Of Imputation Algorithms On Artificially Produced Missing At Random Data, Tobias O. Oketch

Electronic Theses and Dissertations

Missing data is one of the challenges we are facing today in modeling valid statistical models. It reduces the representativeness of the data samples. Hence, population estimates, and model parameters estimated from such data are likely to be biased.

However, the missing data problem is an area under study, and alternative better statistical procedures have been presented to mitigate its shortcomings. In this paper, we review causes of missing data, and various methods of handling missing data. Our main focus is evaluating various multiple imputation (MI) methods from the multiple imputation of chained equation (MICE) package in the statistical software …


Denoising Tandem Mass Spectrometry Data, Felix Offei May 2017

Denoising Tandem Mass Spectrometry Data, Felix Offei

Electronic Theses and Dissertations

Protein identification using tandem mass spectrometry (MS/MS) has proven to be an effective way to identify proteins in a biological sample. An observed spectrum is constructed from the data produced by the tandem mass spectrometer. A protein can be identified if the observed spectrum aligns with the theoretical spectrum. However, data generated by the tandem mass spectrometer are affected by errors thus making protein identification challenging in the field of proteomics. Some of these errors include wrong calibration of the instrument, instrument distortion and noise. In this thesis, we present a pre-processing method, which focuses on the removal of noisy …


A Multi-Indexed Logistic Model For Time Series, Xiang Liu Dec 2016

A Multi-Indexed Logistic Model For Time Series, Xiang Liu

Electronic Theses and Dissertations

In this thesis, we explore a multi-indexed logistic regression (MILR) model, with particular emphasis given to its application to time series. MILR includes simple logistic regression (SLR) as a special case, and the hope is that it will in some instances also produce significantly better results. To motivate the development of MILR, we consider its application to the analysis of both simulated sine wave data and stock data. We looked at well-studied SLR and its application in the analysis of time series data. Using a more sophisticated representation of sequential data, we then detail the implementation of MILR. We compare …


Multilevel Models For Longitudinal Data, Aastha Khatiwada Aug 2016

Multilevel Models For Longitudinal Data, Aastha Khatiwada

Electronic Theses and Dissertations

Longitudinal data arise when individuals are measured several times during an ob- servation period and thus the data for each individual are not independent. There are several ways of analyzing longitudinal data when different treatments are com- pared. Multilevel models are used to analyze data that are clustered in some way. In this work, multilevel models are used to analyze longitudinal data from a case study. Results from other more commonly used methods are compared to multilevel models. Also, comparison in output between two software, SAS and R, is done. Finally a method consisting of fitting individual models for each …


Spatio-Temporal Analysis Of Point Patterns, Abdul-Nasah Soale Aug 2016

Spatio-Temporal Analysis Of Point Patterns, Abdul-Nasah Soale

Electronic Theses and Dissertations

In this thesis, the basic tools of spatial statistics and time series analysis are applied to the case study of the earthquakes in a certain geographical region and time frame. Then some of the existing methods for joint analysis of time and space are described and applied. Finally, additional research questions about the spatial-temporal distribution of the earthquakes are posed and explored using statistical plots and models. The focus in the last section is in the relationship between number of events per year and maximum magnitude and its effect on how clustered the spatial distribution is and the relationship between …


Newsvendor Models With Monte Carlo Sampling, Ijeoma W. Ekwegh Aug 2016

Newsvendor Models With Monte Carlo Sampling, Ijeoma W. Ekwegh

Electronic Theses and Dissertations

Newsvendor Models with Monte Carlo Sampling by Ijeoma Winifred Ekwegh The newsvendor model is used in solving inventory problems in which demand is random. In this thesis, we will focus on a method of using Monte Carlo sampling to estimate the order quantity that will either maximizes revenue or minimizes cost given that demand is uncertain. Given data, the Monte Carlo approach will be used in sampling data over scenarios and also estimating the probability density function. A bootstrapping process yields an empirical distribution for the order quantity that will maximize the expected profit. Finally, this method will be used …


Analyses Of 2002-2013 China’S Stock Market Using The Shared Frailty Model, Chao Tang Aug 2014

Analyses Of 2002-2013 China’S Stock Market Using The Shared Frailty Model, Chao Tang

Electronic Theses and Dissertations

This thesis adopts a survival model to analyze China’s stock market. The data used are the capitalization-weighted stock market index (CSI 300) and the 300 stocks for creating the index. We define the recurrent events using the daily return of the selected stocks and the index. A shared frailty model which incorporates the random effects is then used for analyses since the survival times of individual stocks are correlated. Maximization of penalized likelihood is presented to estimate the parameters in the model. The covariates are selected using the Akaike information criterion (AIC) and the variance inflation factor (VIF) to avoid …


Are Highly Dispersed Variables More Extreme? The Case Of Distributions With Compact Support, Benedict E. Adjogah May 2014

Are Highly Dispersed Variables More Extreme? The Case Of Distributions With Compact Support, Benedict E. Adjogah

Electronic Theses and Dissertations

We consider discrete and continuous symmetric random variables X taking values in [0; 1], and thus having expected value 1/2. The main thrust of this investigation is to study the correlation between the variance, Var(X) of X and the value of the expected maximum E(Mn) = E(X1,...,Xn) of n independent and identically distributed random variables X1,X2,...,Xn, each distributed as X. Many special cases are studied, some leading to very interesting alternating sums, and some progress is made towards a general theory.


Solving The Differential Equation For The Probit Function Using A Variant Of The Carleman Embedding Technique., Kelechukwu Iroajanma Alu May 2011

Solving The Differential Equation For The Probit Function Using A Variant Of The Carleman Embedding Technique., Kelechukwu Iroajanma Alu

Electronic Theses and Dissertations

The probit function is the inverse of the cumulative distribution function associated with the standard normal distribution. It is of great utility in statistical modelling. The Carleman embedding technique has been shown to be effective in solving first order and, less efficiently, second order nonlinear differential equations. In this thesis, we show that solutions to the second order nonlinear differential equation for the probit function can be approximated efficiently using a variant of the Carleman embedding technique.


Early Stopping Of A Neural Network Via The Receiver Operating Curve., Daoping Yu Aug 2010

Early Stopping Of A Neural Network Via The Receiver Operating Curve., Daoping Yu

Electronic Theses and Dissertations

This thesis presents the area under the ROC (Receiver Operating Characteristics) curve, or abbreviated AUC, as an alternate measure for evaluating the predictive performance of ANNs (Artificial Neural Networks) classifiers. Conventionally, neural networks are trained to have total error converge to zero which may give rise to over-fitting problems. To ensure that they do not over fit the training data and then fail to generalize well in new data, it appears effective to stop training as early as possible once getting AUC sufficiently large via integrating ROC/AUC analysis into the training process. In order to reduce learning costs involving the …


Examining Significant Differences Of Gunshot Residue Patterns Using Same Make And Model Of Firearms In Forensic Distance Determination Tests., Heather Lewey Dec 2007

Examining Significant Differences Of Gunshot Residue Patterns Using Same Make And Model Of Firearms In Forensic Distance Determination Tests., Heather Lewey

Electronic Theses and Dissertations

In many cases of crimes involving a firearm, police investigators need to know how far the firearm was held from the victim when it was discharged. Knowing this distance, vital questions regarding the re-construction of the crime scene can be known. Often, the original firearm used in commission of a suspected crime is not available for testing or is damaged. Crime laboratories require the original firearm in order to conduct distance determination tests. However, no empirical research has ever been conducted to determine if same make and model firearms produce different results in distance determination testing. It was the purpose …