Cointegration And Statistical Arbitrage Of Precious Metals, 2021 University of Arkansas, Fayetteville

#### Cointegration And Statistical Arbitrage Of Precious Metals, Judge Van Horn

*Finance Undergraduate Honors Theses*

When talking about financial instruments correlation is often thrown around as a measure of the relation between two securities. An often more useful or tradeable measure is cointegration. Cointegration is the measure of two securities tendency to revert to an average price over time. In other words, cointegration ignores directionality and only cares about the distance between two securities. For a mean reversion strategy such as statistical arbitrage cointegration proves to be a far more reliable statistical measure of mean reversion, and while it is more reliable than correlation it still has its own problems. One thing to consider is ...

Markov Chains And Their Applications, 2021 University of Texas at Tyler

#### Markov Chains And Their Applications, Fariha Mahfuz

*Math Theses*

Markov chain is a stochastic model that is used to predict future events. Markov chain is relatively simple since it only requires the information of the present state to predict the future states. In this paper we will go over the basic concepts of Markov Chain and several of its applications including Google PageRank algorithm, weather prediction and gamblers ruin.

We examine on how the Google PageRank algorithm works efficiently to provide PageRank for a Google search result. We also show how can we use Markov chain to predict weather by creating a model from real life data.

Predicting The Number Of Future Events, 2021 Iowa State University

#### Predicting The Number Of Future Events, Qinglong Tian, Fanqi Meng, Daniel J. Nordman, William Q. Meeker

*Statistics Publications*

This paper describes prediction methods for the number of future events from a population of units associated with an on-going time-to-event process. Examples include the prediction of warranty returns and the prediction of the number of future product failures that could cause serious threats to property or life. Important decisions such as whether a product recall should be mandated are often based on such predictions. Data, generally right-censored (and sometimes left truncated and right-censored), are used to estimate the parameters of a time-to-event distribution. This distribution can then be used to predict the number of events over future periods of ...

Sars-Cov-2 Pandemic Analytical Overview With Machine Learning Predictability, 2021 Southern Methodist University

#### Sars-Cov-2 Pandemic Analytical Overview With Machine Learning Predictability, Anthony Tanaydin, Jingchen Liang, Daniel W. Engels

*SMU Data Science Review*

Understanding diagnostic tests and examining important features of novel coronavirus (COVID-19) infection are essential steps for controlling the current pandemic of 2020. In this paper, we study the relationship between clinical diagnosis and analytical features of patient blood panels from the US, Mexico, and Brazil. Our analysis confirms that among adults, the risk of severe illness from COVID-19 increases with pre-existing conditions such as diabetes and immunosuppression. Although more than eight months into pandemic, more data have become available to indicate that more young adults were getting infected. In addition, we expand on the definition of COVID-19 test and discuss ...

An Evaluation Of Knot Placement Strategies For Spline Regression, 2021 Claremont Colleges

#### An Evaluation Of Knot Placement Strategies For Spline Regression, William Klein

*CMC Senior Theses*

Regression splines have an established value for producing quality fit at a relatively low-degree polynomial. This paper explores the implications of adopting new methods for knot selection in tandem with established methodology from the current literature. Structural features of generated datasets, as well as residuals collected from sequential iterative models are used to augment the equidistant knot selection process. From analyzing a simulated dataset and an application onto the Racial Animus dataset, I find that a B-spline basis paired with equally-spaced knots remains the best choice when data are evenly distributed, even when structural features of a dataset are known ...

Joint Modeling Of Distances And Times In Point-Count Surveys, 2021 University of Michigan

#### Joint Modeling Of Distances And Times In Point-Count Surveys, Adam Martin-Schwarze, Jarad Niemi, Philip Dixon

*Statistics Publications*

Removal and distance modeling are two common methods to adjust counts for imperfect detection in point-count surveys. Several recent articles have formulated models to combine them into a distance-removal framework. We observe that these models fall into two groups building from different assumptions about the joint distribution of observed distances and first times to detection. One approach assumes the joint distribution results from a Poisson process (PP). The other assumes an independent joint (IJ) distribution with its joint density being the product of its marginal densities. We compose an IJ+PP model that more flexibly models the joint distribution and ...

Bayesian Semi-Supervised Keyphrase Extraction And Jackknife Empirical Likelihood For Assessing Heterogeneity In Meta-Analysis, 2020 Southern Methodist University

#### Bayesian Semi-Supervised Keyphrase Extraction And Jackknife Empirical Likelihood For Assessing Heterogeneity In Meta-Analysis, Guanshen Wang

*Statistical Science Theses and Dissertations*

This dissertation investigates: (1) A Bayesian Semi-supervised Approach to Keyphrase Extraction with Only Positive and Unlabeled Data, (2) Jackknife Empirical Likelihood Confidence Intervals for Assessing Heterogeneity in Meta-analysis of Rare Binary Events.

In the big data era, people are blessed with a huge amount of information. However, the availability of information may also pose great challenges. One big challenge is how to extract useful yet succinct information in an automated fashion. As one of the first few efforts, keyphrase extraction methods summarize an article by identifying a list of keyphrases. Many existing keyphrase extraction methods focus on the unsupervised setting ...

Improved Statistical Methods For Time-Series And Lifetime Data, 2020 Southern Methodist University

#### Improved Statistical Methods For Time-Series And Lifetime Data, Xiaojie Zhu

*Statistical Science Theses and Dissertations*

In this dissertation, improved statistical methods for time-series and lifetime data are developed. First, an improved trend test for time series data is presented. Then, robust parametric estimation methods based on system lifetime data with known system signatures are developed.

In the first part of this dissertation, we consider a test for the monotonic trend in time series data proposed by Brillinger (1989). It has been shown that when there are highly correlated residuals or short record lengths, Brillinger’s test procedure tends to have significance level much higher than the nominal level. This could be related to the discrepancy ...

Quantifying The Simultaneous Effect Of Socio-Economic Predictors And Build Environment On Spatial Crime Trends, 2020 University of Arkansas, Fayetteville

#### Quantifying The Simultaneous Effect Of Socio-Economic Predictors And Build Environment On Spatial Crime Trends, Alfieri Daniel Ek

*Theses and Dissertations*

Proper allocation of law enforcement agencies falls under the umbrella of risk terrainmodeling (Caplan et al., 2011, 2015; Drawve, 2016) that primarily focuses on crime prediction and prevention by spatially aggregating response and predictor variables of interest. Although mental health incidents demand resource allocation from law enforcement agencies and the city, relatively less emphasis has been placed on building spatial models for mental health incidents events. Analyzing spatial mental health events in Little Rock, AR over 2015 to 2018, we found evidence of spatial heterogeneity via Moran’s I statistic. A spatial modeling framework is then built using generalized linear ...

Concerns In Id'ing A Suitable Distribution, 2020 Siena College School of Business

#### Concerns In Id'ing A Suitable Distribution, Necip Doganaksoy, Gerald J. Hahn, William Q. Meeker

*Statistics Publications*

Analysis of product lifetime data generally requires fitting a suitable distribution to the data at hand. The fitted distribution is used to estimate quantities of interest, such as the fraction of product failing after various times in service and selected distribution percentiles (for example, the estimated time by which 1% of the product population is expected to fail).

Maximum Entropy Classification For Record Linkage, 2020 University of Alabama

#### Maximum Entropy Classification For Record Linkage, Danhyang Lee, Li-Chun Zhang, Jae Kwang Kim

*Statistics Publications*

By record linkage one joins records residing in separate files which are believed to be related to the same entity. In this paper we approach record linkage as a classification problem, and adapt the maximum entropy classification method in text mining to record linkage, both in the supervised and unsupervised settings of machine learning. The set of links will be chosen according to the associated uncertainty. On the one hand, our framework overcomes some persistent theoretical flaws of the classical approach pioneered by Fellegi and Sunter (1969); on the other hand, the proposed algorithm is scalable and fully automatic, unlike ...

Semiparametric Imputation Using Conditional Gaussian Mixture Models Under Item Nonresponse, 2020 University of Alabama

#### Semiparametric Imputation Using Conditional Gaussian Mixture Models Under Item Nonresponse, Danhyang Lee, Jae Kwang Kim

*Statistics Publications*

Imputation is a popular technique for handling item nonresponse in survey sampling. Parametric imputation is based on a parametric model for imputation and is less robust against the failure of the imputation model. Nonparametric imputation is fully robust but is not applicable when the dimension of covariates is large due to the curse of dimensionality. Semiparametric imputation is another robust imputation based on a flexible model where the number of model parameters can increase with the sample size. In this paper, we propose another semiparametric imputation based on a more flexible model assumption than the Gaussian mixture model. In the ...

Exploring The Relationship Between Children’S Vocabulary And Their Understanding Of Cardinality: A Methodological Approach, 2020 University of Connecticut - Storrs

#### Exploring The Relationship Between Children’S Vocabulary And Their Understanding Of Cardinality: A Methodological Approach, Justin Slifer, Emily Carrigan, Kristin Walker, Marie Coppola

*Honors Scholar Theses*

Is there a relationship between vocabulary and children’s understanding of cardinality? Does the way in which we classify cardinality data as tested by the Give-a-Number task affect finding such a relationship? This thesis explored these questions using a methodological approach, by testing the relationship between children’s receptive vocabulary scores and Give-a-Number scores classified in two different ways, the traditional knower-level assessment, as well as by calculating the proportion of trials answered correctly. A significant correlation was found between participants’ receptive vocabulary scores and Give-a-Number scores using both manners of classification, independent of the children’s ages. The results ...

Predicting Postoperative Delirium Risk For Intracranial Surgery: A Statistical Machine Learning Approach, 2020 Purdue University

#### Predicting Postoperative Delirium Risk For Intracranial Surgery: A Statistical Machine Learning Approach, Juliet Aygun, Alaina Bartfeld, Sahana Rayan

*The Journal of Purdue Undergraduate Research*

No abstract provided.

Statistical Methods For Resolving Intratumor Heterogeneity With Single-Cell Dna Sequencing, 2020 The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences

#### Statistical Methods For Resolving Intratumor Heterogeneity With Single-Cell Dna Sequencing, Alexander Davis

*The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences Dissertations and Theses (Open Access)*

Tumor cells have heterogeneous genotypes, which drives progression and treatment resistance. Such genetic intratumor heterogeneity plays a role in the process of clonal evolution that underlies tumor progression and treatment resistance. Single-cell DNA sequencing is a promising experimental method for studying intratumor heterogeneity, but brings unique statistical challenges in interpreting the resulting data. Researchers lack methods to determine whether sufficiently many cells have been sampled from a tumor. In addition, there are no proven computational methods for determining the ploidy of a cell, a necessary step in the determination of copy number. In this work, software for calculating probabilities from ...

Reliability Disasters: Technical Learnings From Past Mistakes To Mitigate And Avoid Future Catastrophes, 2020 Siena College

#### Reliability Disasters: Technical Learnings From Past Mistakes To Mitigate And Avoid Future Catastrophes, Necip Doganaksoy, William Q. Meeker, Gerald J. Hahn

*Statistics Publications*

When products fail in the field, disasters can result. To head off problems, manufacturers must build reliability into the design of products and processes. Statistics can be used proactively to help improve reliability during product design and development and enable manufacturers to “do it right the first time.” The authors describe some technical and statistical problems from four specific reliability disasters to highlight lessons learned.

Lectures On Mathematical Computing With Python, 2020 Portland State University

#### Lectures On Mathematical Computing With Python, Jay Gopalakrishnan

*PDXOpen: Open Educational Resources*

This open resource is a collection of class activities for use in undergraduate courses aimed at teaching mathematical computing, and computational thinking in general, using the python programming language. It was developed for a second-year course (MTH 271) revamped for a new undergraduate program in data science at Portland State University. The activities are designed to guide students' use of python modules effectively for scientific computation, data analysis, and visualization.

**Adopt/Adapt**

If you are an instructor adopting or adapting this open educational resource, please help us understand your use by filling out this form

Statistical Methodology To Establish A Benchmark For Evaluating Antimicrobial Resistance Genes Through Real Time Pcr Assay, 2020 University of Nebraska - Lincoln

#### Statistical Methodology To Establish A Benchmark For Evaluating Antimicrobial Resistance Genes Through Real Time Pcr Assay, Enakshy Dutta

*Dissertations and Theses in Statistics*

Novel diagnostic tests are usually compared with gold standard tests for evaluating diagnostic accuracy. For assessing antimicrobial resistance (AMR) to bovine respiratory disease (BRD) pathogens, phenotypic broth microdilution method is used as gold standard (GS). The objective of the thesis is to evaluate the optimal cycle threshold (Ct) generated by real-time polymerase chain reaction (rtPCR) to genes that confer resistance that will translate to the phenotypic classification of AMR. Data from two different methodologies are assessed to identify Ct that will discriminate between resistance (R) and susceptibility (S). First, the receiver operating characteristic (ROC) curve was used to determine the ...

Latent Class Models For At-Risk Populations, 2020 University of Massachusetts Amherst

#### Latent Class Models For At-Risk Populations, Shuaimin Kang

*Doctoral Dissertations*

**Clustering Network Tree Data From Respondent-Driven Sampling With Application to Opioid Users in New York City**

There is great interest in finding meaningful subgroups of attributed network data. There are many available methods for clustering complete network. Unfortunately, much network data is collected through sampling, and therefore incomplete. Respondent-driven sampling (RDS) is a widely used method for sampling hard-to-reach human populations based on tracing links in the underlying unobserved social network. The resulting data therefore have tree structure representing a sub-sample of the network, along with many nodal attributes. In this paper, we introduce an approach to adjust mixture models ...

Improving The Quality And Design Of Retrospective Clinical Outcome Studies That Utilize Electronic Health Records, 2020 HCA Healthcare Mountain MidAmerica and Continental Divisions

#### Improving The Quality And Design Of Retrospective Clinical Outcome Studies That Utilize Electronic Health Records, Oliwier Dziadkowiec, Jeffery Durbin, Vignesh Jayaraman Muralidharan, Megan Novak, Brendon Cornett

*HCA Healthcare Journal of Medicine*

Electronic health records (EHRs) are an excellent source for secondary data analysis. Studies based on EHR-derived data, if designed properly, can answer previously unanswerable clinical research questions. In this paper we will highlight the benefits of large retrospective studies from secondary sources such as EHRs, examine retrospective cohort and case-control study design challenges, as well as methodological and statistical adjustment that can be made to overcome some of the inherent design limitations, in order to increase the generalizability, validity and reliability of the results obtained from these studies.