Cointegration And Statistical Arbitrage Of Precious Metals, 2021 University of Arkansas, Fayetteville

#### Cointegration And Statistical Arbitrage Of Precious Metals, Judge Van Horn

*Finance Undergraduate Honors Theses*

When talking about financial instruments correlation is often thrown around as a measure of the relation between two securities. An often more useful or tradeable measure is cointegration. Cointegration is the measure of two securities tendency to revert to an average price over time. In other words, cointegration ignores directionality and only cares about the distance between two securities. For a mean reversion strategy such as statistical arbitrage cointegration proves to be a far more reliable statistical measure of mean reversion, and while it is more reliable than correlation it still has its own problems. One thing to consider is ...

Markov Chains And Their Applications, 2021 University of Texas at Tyler

#### Markov Chains And Their Applications, Fariha Mahfuz

*Math Theses*

Markov chain is a stochastic model that is used to predict future events. Markov chain is relatively simple since it only requires the information of the present state to predict the future states. In this paper we will go over the basic concepts of Markov Chain and several of its applications including Google PageRank algorithm, weather prediction and gamblers ruin.

We examine on how the Google PageRank algorithm works efficiently to provide PageRank for a Google search result. We also show how can we use Markov chain to predict weather by creating a model from real life data.

Predicting The Number Of Future Events, 2021 Iowa State University

#### Predicting The Number Of Future Events, Qinglong Tian, Fanqi Meng, Daniel J. Nordman, William Q. Meeker

*Statistics Publications*

This paper describes prediction methods for the number of future events from a population of units associated with an on-going time-to-event process. Examples include the prediction of warranty returns and the prediction of the number of future product failures that could cause serious threats to property or life. Important decisions such as whether a product recall should be mandated are often based on such predictions. Data, generally right-censored (and sometimes left truncated and right-censored), are used to estimate the parameters of a time-to-event distribution. This distribution can then be used to predict the number of events over future periods of ...

Sars-Cov-2 Pandemic Analytical Overview With Machine Learning Predictability, 2021 Southern Methodist University

#### Sars-Cov-2 Pandemic Analytical Overview With Machine Learning Predictability, Anthony Tanaydin, Jingchen Liang, Daniel W. Engels

*SMU Data Science Review*

Understanding diagnostic tests and examining important features of novel coronavirus (COVID-19) infection are essential steps for controlling the current pandemic of 2020. In this paper, we study the relationship between clinical diagnosis and analytical features of patient blood panels from the US, Mexico, and Brazil. Our analysis confirms that among adults, the risk of severe illness from COVID-19 increases with pre-existing conditions such as diabetes and immunosuppression. Although more than eight months into pandemic, more data have become available to indicate that more young adults were getting infected. In addition, we expand on the definition of COVID-19 test and discuss ...

An Evaluation Of Knot Placement Strategies For Spline Regression, 2021 Claremont Colleges

#### An Evaluation Of Knot Placement Strategies For Spline Regression, William Klein

*CMC Senior Theses*

Regression splines have an established value for producing quality fit at a relatively low-degree polynomial. This paper explores the implications of adopting new methods for knot selection in tandem with established methodology from the current literature. Structural features of generated datasets, as well as residuals collected from sequential iterative models are used to augment the equidistant knot selection process. From analyzing a simulated dataset and an application onto the Racial Animus dataset, I find that a B-spline basis paired with equally-spaced knots remains the best choice when data are evenly distributed, even when structural features of a dataset are known ...

Joint Modeling Of Distances And Times In Point-Count Surveys, 2021 University of Michigan

#### Joint Modeling Of Distances And Times In Point-Count Surveys, Adam Martin-Schwarze, Jarad Niemi, Philip Dixon

*Statistics Publications*

Removal and distance modeling are two common methods to adjust counts for imperfect detection in point-count surveys. Several recent articles have formulated models to combine them into a distance-removal framework. We observe that these models fall into two groups building from different assumptions about the joint distribution of observed distances and first times to detection. One approach assumes the joint distribution results from a Poisson process (PP). The other assumes an independent joint (IJ) distribution with its joint density being the product of its marginal densities. We compose an IJ+PP model that more flexibly models the joint distribution and ...

Power And Statistical Significance In Securities Fraud Litigation, 2021 University of Pennsylvania Carey Law School

#### Power And Statistical Significance In Securities Fraud Litigation, Jill E. Fisch, Jonah B. Gelbach

*Faculty Scholarship at Penn Law*

Event studies, a half-century-old approach to measuring the effect of events on stock prices, are now ubiquitous in securities fraud litigation. In determining whether the event study demonstrates a price effect, expert witnesses typically base their conclusion on whether the results are statistically significant at the 95% confidence level, a threshold that is drawn from the academic literature. As a positive matter, this represents a disconnect with legal standards of proof. As a normative matter, it may reduce enforcement of fraud claims because litigation event studies typically involve quite low statistical power even for large-scale frauds.

This paper, written for ...

A Bayesian Hierarchical Mixture Model With Continuous-Time Markov Chains To Capture Bumblebee Foraging Behavior, 2021 Bowdoin College

#### A Bayesian Hierarchical Mixture Model With Continuous-Time Markov Chains To Capture Bumblebee Foraging Behavior, Max Thrush Hukill

*Honors Projects*

The standard statistical methodology for analyzing complex case-control studies in ethology is often limited by approaches that force researchers to model distinct aspects of biological processes in a piecemeal, disjointed fashion. By developing a hierarchical Bayesian model, this work demonstrates that statistical inference in this context can be done using a single coherent framework. To do this, we construct a continuous-time Markov chain (CTMC) to model bumblebee foraging behavior. To connect the experimental design with the CTMC, we employ a mixture model controlled by a logistic regression on the two-factor design matrix. We then show how to infer these model ...

An Updated Method For Correcting Batch Effect, 2021 Iowa State University

#### An Updated Method For Correcting Batch Effect, Yonghui Huo

*Creative Components*

Abstract

I propose a novel variation (Pro-SVA) on iteratively reweighted surrogate variable analysis (IRW-SVA) for detecting and measuring batch effects in high dimensional gene expression data. Specifically, I propose to use the matrix-free high dimensional factor analysis (HDFA) algorithm instead of singular value decomposition (SVD) in the IRW-SVA iterations. HDFA efficiently provides the maximum likelihood estimates of the error variances and batch loadings, which can subsequently be used to estimate the batch factors. To evaluate the performance of Pro-SVA, I simulated 100 samples of 1,000 genes with batch effects and (1) no biological effects, (2) biological effects for half ...

An Efficient Algorithm For Kernel K Means, 2021 Iowa State University

#### An Efficient Algorithm For Kernel K Means, Joshua Berlinski

*Creative Components*

Kernel *K*-means extends the standard *K*-means clustering method to identify non-spherical clusters by performing the algorithm in a higher dimensional feature space. Typically, this extension is implemented using a method based on Lloyd's heuristic. A method based on Hartigan and Wong's heuristic is presented here, which improves the run time required to reach the final clustering. Additionally, methods for selecting the number of clusters and the tuning parameter for the Gaussian kernel are discussed. An adaptation of the *K*-means++ initialization method is also presented and discussed. Each of the methods is evaluated and compared on ...

Upper-Sided Ewma-Based Distribution-Specific Tolerance Limits, 2021 University of North Florida

#### Upper-Sided Ewma-Based Distribution-Specific Tolerance Limits, Owen Visser

*UNF Graduate Theses and Dissertations*

Tolerance limits are constructed from sample data to ascertain if a proportion of a process is within specification limits. There exists multiple methods of calculating the sample size requirements for tolerance limits under various assumptions. In this research, a distribution-specific algorithm that utilizes the exponentially weighted moving average technique (EWMA), first introduced by Sa and Razaila (2004), is reconstructed. The algorithm is used to calculate the required sample sizes for continuous construction of upper-sided tolerance limits. The sample sizes and intervals constructed from them are compared to three existing methods for various distributions. The distribution-specific algorithm was observed to reduce ...

Modeling And Inference For Mixtures Of Simple Symmetric Exponential Families Of P-Dimensional Distributions For Vectors With Binary Coordinates, 2021 Lawrence University

#### Modeling And Inference For Mixtures Of Simple Symmetric Exponential Families Of P-Dimensional Distributions For Vectors With Binary Coordinates, Abhishek Chakraborty, Stephen B. Vardeman

*Statistics Publications*

We propose tractable symmetric exponential families of distributions for multivariate vectors of 0's and 1's in dimensions, or what are referred to in this paper as binary vectors, that allow for nontrivial amounts of variation around some central value . We note that more or less standard asymptotics provides likelihood-based inference in the one-sample problem. We then consider mixture models where component distributions are of this form. Bayes analysis based on Dirichlet processes and Jeffreys priors for the exponential family parameters prove tractable and informative in problems where relevant distributions for a vector of binary variables are clearly not ...

Fracture Mechanics-Based Quantitative Matching Of Forensic Evidence Fragments, 2021 Indiana University

#### Fracture Mechanics-Based Quantitative Matching Of Forensic Evidence Fragments, Geoffrey Z. Thompson, Bishoy Dawood, Tianyu Yu, Barbara K. Lograsso, John D. Vanderkolk, Ranjan Maitra, William Q. Meeker, Ashraf F. Bastawros

*Aerospace Engineering Publications*

Fractured metal fragments with rough and irregular surfaces are often found at crime scenes. Current forensic practice visually inspects the complex jagged trajectory of fractured surfaces to recognize a ``match'' using comparative microscopy and physical pattern analysis. We developed a novel computational framework, utilizing the basic concepts of fracture mechanics and statistical analysis to provide quantitative match analysis for match probability and error rates. The framework employs the statistics of fracture surfaces to become non-self-affine with unique roughness characteristics at relevant microscopic length scale, dictated by the intrinsic material resistance to fracture and its microstructure. At such a scale, which ...

Evaluation Of A Random Forest Model To Identify Invasive Carp Eggs Based On Morphometric Features, 2021 Iowa State University

#### Evaluation Of A Random Forest Model To Identify Invasive Carp Eggs Based On Morphometric Features, Katherine Goode, Michael J. Weber, Aaron Matthews, Clay L. Pierce

*Natural Resource Ecology and Management Publications*

Three species of invasive carp—Grass Carp *Ctenopharyngodon idella*, Silver Carp *Hypophthalmichthys molitrix*, and Bighead Carp *H. nobilis*—are rapidly spreading throughout North America. Monitoring their reproduction can help to determine establishment in new areas but is difficult due to challenges associated with identifying fish eggs. Recently, random forest models provided accurate identification of eggs based on morphological traits, but the models have not been validated using independent data. Our objective was to evaluate the predictive performance of egg identification models developed by Camacho et al. (2019) for classifying invasive carp eggs by using an independent data set. When invasive ...

Bayesian Semi-Supervised Keyphrase Extraction And Jackknife Empirical Likelihood For Assessing Heterogeneity In Meta-Analysis, 2020 Southern Methodist University

#### Bayesian Semi-Supervised Keyphrase Extraction And Jackknife Empirical Likelihood For Assessing Heterogeneity In Meta-Analysis, Guanshen Wang

*Statistical Science Theses and Dissertations*

This dissertation investigates: (1) A Bayesian Semi-supervised Approach to Keyphrase Extraction with Only Positive and Unlabeled Data, (2) Jackknife Empirical Likelihood Confidence Intervals for Assessing Heterogeneity in Meta-analysis of Rare Binary Events.

In the big data era, people are blessed with a huge amount of information. However, the availability of information may also pose great challenges. One big challenge is how to extract useful yet succinct information in an automated fashion. As one of the first few efforts, keyphrase extraction methods summarize an article by identifying a list of keyphrases. Many existing keyphrase extraction methods focus on the unsupervised setting ...

Improved Statistical Methods For Time-Series And Lifetime Data, 2020 Southern Methodist University

#### Improved Statistical Methods For Time-Series And Lifetime Data, Xiaojie Zhu

*Statistical Science Theses and Dissertations*

In this dissertation, improved statistical methods for time-series and lifetime data are developed. First, an improved trend test for time series data is presented. Then, robust parametric estimation methods based on system lifetime data with known system signatures are developed.

In the first part of this dissertation, we consider a test for the monotonic trend in time series data proposed by Brillinger (1989). It has been shown that when there are highly correlated residuals or short record lengths, Brillinger’s test procedure tends to have significance level much higher than the nominal level. This could be related to the discrepancy ...

Concerns In Id'ing A Suitable Distribution, 2020 Siena College School of Business

#### Concerns In Id'ing A Suitable Distribution, Necip Doganaksoy, Gerald J. Hahn, William Q. Meeker

*Statistics Publications*

Analysis of product lifetime data generally requires fitting a suitable distribution to the data at hand. The fitted distribution is used to estimate quantities of interest, such as the fraction of product failing after various times in service and selected distribution percentiles (for example, the estimated time by which 1% of the product population is expected to fail).

Random Search Plus: A More Effective Random Search For Machine Learning Hyperparameters Optimization, 2020 University of Tennessee, Knoxville

#### Random Search Plus: A More Effective Random Search For Machine Learning Hyperparameters Optimization, Bohan Li

*Masters Theses*

Machine learning hyperparameter optimization has always been the key to improve model performance. There are many methods of hyperparameter optimization. The popular methods include grid search, random search, manual search, Bayesian optimization, population-based optimization, etc. Random search occupies less computations than the grid search, but at the same time there is a penalty for accuracy. However, this paper proposes a more effective random search method based on the traditional random search and hyperparameter space separation. This method is named random search plus. This thesis empirically proves that random search plus is more effective than random search. There are some case ...

Quantifying The Simultaneous Effect Of Socio-Economic Predictors And Build Environment On Spatial Crime Trends, 2020 University of Arkansas, Fayetteville

#### Quantifying The Simultaneous Effect Of Socio-Economic Predictors And Build Environment On Spatial Crime Trends, Alfieri Daniel Ek

*Theses and Dissertations*

Proper allocation of law enforcement agencies falls under the umbrella of risk terrainmodeling (Caplan et al., 2011, 2015; Drawve, 2016) that primarily focuses on crime prediction and prevention by spatially aggregating response and predictor variables of interest. Although mental health incidents demand resource allocation from law enforcement agencies and the city, relatively less emphasis has been placed on building spatial models for mental health incidents events. Analyzing spatial mental health events in Little Rock, AR over 2015 to 2018, we found evidence of spatial heterogeneity via Moran’s I statistic. A spatial modeling framework is then built using generalized linear ...

Maximum Entropy Classification For Record Linkage, 2020 University of Alabama

#### Maximum Entropy Classification For Record Linkage, Danhyang Lee, Li-Chun Zhang, Jae Kwang Kim

*Statistics Publications*

By record linkage one joins records residing in separate files which are believed to be related to the same entity. In this paper we approach record linkage as a classification problem, and adapt the maximum entropy classification method in text mining to record linkage, both in the supervised and unsupervised settings of machine learning. The set of links will be chosen according to the associated uncertainty. On the one hand, our framework overcomes some persistent theoretical flaws of the classical approach pioneered by Fellegi and Sunter (1969); on the other hand, the proposed algorithm is scalable and fully automatic, unlike ...