Open Access. Powered by Scholars. Published by Universities.®

Statistical Models Commons

Open Access. Powered by Scholars. Published by Universities.®

Theses/Dissertations

2017

Discipline
Institution
Keyword
Publication

Articles 1 - 30 of 46

Full-Text Articles in Statistical Models

Seasonal Resource Selection And Habitat Treatment Use By A Fringe Population Of Greater Sage-Grouse, Rhett Boswell Dec 2017

Seasonal Resource Selection And Habitat Treatment Use By A Fringe Population Of Greater Sage-Grouse, Rhett Boswell

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Movement and habitat selection by Greater Sage-grouse (Centrocercus uropasianus) is of great interest to wildlife managers tasked with applying conservation measures for this iconic western species. Current technology has created small and lightweight GPS (Global Positioning Systems) transmitters that can be attached to sage-grouse. Using GIS software and statistical programs such as Program R, land managers can analyze GPS location data to assess how sage-grouse are geospatially interacting with their habitats. Within the Panguitch Sage-Grouse Management Area (SGMA) thousands of acres of land have been restored or manipulated to enhance sage-grouse habitat; this usually involves removal of pinyon pine …


Statistical Analysis Of Momentum In Basketball, Mackenzi Stump Dec 2017

Statistical Analysis Of Momentum In Basketball, Mackenzi Stump

Honors Projects

The “hot hand” in sports has been debated for as long as sports have been around. The debate involves whether streaks and slumps in sports are true phenomena or just simply perceptions in the mind of the human viewer. This statistical analysis of momentum in basketball analyzes the distribution of time between scoring events for the BGSU Women’s Basketball team from 2011-2017. We discuss how the distribution of time between scoring events changes with normal game factors such as location of the game, game outcome, and several other factors. If scoring events during a game were always randomly distributed, or …


Developing Leading And Lagging Indicators To Enhance Equipment Reliability In A Lean System, Dhanush Agara Mallesh Dec 2017

Developing Leading And Lagging Indicators To Enhance Equipment Reliability In A Lean System, Dhanush Agara Mallesh

Masters Theses

With increasing complexity in equipment, the failure rates are becoming a critical metric due to the unplanned maintenance in a production environment. Unplanned maintenance in manufacturing process is created issues with downtimes and decreasing the reliability of equipment. Failures in equipment have resulted in the loss of revenue to organizations encouraging maintenance practitioners to analyze ways to change unplanned to planned maintenance. Efficient failure prediction models are being developed to learn about the failures in advance. With this information, failures predicted can reduce the downtimes in the system and improve the throughput.

The goal of this thesis is to predict …


Making Models With Bayes, Pilar Olid Dec 2017

Making Models With Bayes, Pilar Olid

Electronic Theses, Projects, and Dissertations

Bayesian statistics is an important approach to modern statistical analyses. It allows us to use our prior knowledge of the unknown parameters to construct a model for our data set. The foundation of Bayesian analysis is Bayes' Rule, which in its proportional form indicates that the posterior is proportional to the prior times the likelihood. We will demonstrate how we can apply Bayesian statistical techniques to fit a linear regression model and a hierarchical linear regression model to a data set. We will show how to apply different distributions to Bayesian analyses and how the use of a prior affects …


Bayesian Model For Detection Of Outliers In Linear Regression With Application To Longitudinal Data, Zahraa Al-Sharea Dec 2017

Bayesian Model For Detection Of Outliers In Linear Regression With Application To Longitudinal Data, Zahraa Al-Sharea

Graduate Theses and Dissertations

Outlier detection is one of the most important challenges with many present-day applications. Outliers can occur due to uncertainty in data generating mechanisms or due to an error in data recording/processing. Outliers can drastically change the study's results and make predictions less reliable. Detecting outliers in longitudinal studies is quite challenging because this kind of study is working with observations that change over time. Therefore, the same subject can produce an outlier at one point in time produce regular observations at all other time points. A Bayesian hierarchical modeling assigns parameters that can quantify whether each observation is an outlier …


Novel Statistical Models For Quantitative Shape-Gene Association Selection, Xiaotian Dai Dec 2017

Novel Statistical Models For Quantitative Shape-Gene Association Selection, Xiaotian Dai

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Other research reported that genetic mechanism plays a major role in the development process of biological shapes. The primary goal of this dissertation is to develop novel statistical models to investigate the quantitative relationships between biological shapes and genetic variants. However, these problems can be extremely challenging to traditional statistical models for a number of reasons: 1) the biological phenotypes cannot be effectively represented by single-valued traits, while traditional regression only handles one dependent variable; 2) in real-life genetic data, the number of candidate genes to be investigated is extremely large, and the signal-to-noise ratio of candidate genes is expected …


Statistical Modelling, Optimal Strategies And Decisions In Two-Period Economies, Jiang Wu Nov 2017

Statistical Modelling, Optimal Strategies And Decisions In Two-Period Economies, Jiang Wu

Electronic Thesis and Dissertation Repository

Motivated by some real problems, our thesis puts forward two general two-period pricing models and explore optimal buying and selling strategies in two states of the two-period decision, when buyer/seller's decisions in the two periods are uncertain: commodity valuations may or may not be independent, may or may not follow the same distribution, be heavily or just lightly influenced by exogenous economic conditions, and so on. For both the example of buying laptops and the example of selling houses, the connections between each example and the two-envelope paradox encourage us to explore optimal strategies based on the works of McDonnell …


Data-Adaptive Kernel Support Vector Machine, Xin Liu Nov 2017

Data-Adaptive Kernel Support Vector Machine, Xin Liu

Electronic Thesis and Dissertation Repository

In this thesis, we propose the data-adaptive kernel Support Vector Machine (SVM), a new method with a data-driven scaling kernel function based on real data sets. This two-stage approach of kernel function scaling can enhance the accuracy of a support vector machine, especially when the data are imbalanced. Followed by the standard SVM procedure in the first stage, the proposed method locally adapts the kernel function to data locations based on the skewness of the class outcomes. In the second stage, the decision rule is constructed with the data-adaptive kernel function and is used as the classifier. This process enlarges …


Juvenile River Herring In Freshwater Lakes: Sampling Approaches For Evaluating Growth And Survival, Matthew T. Devine Oct 2017

Juvenile River Herring In Freshwater Lakes: Sampling Approaches For Evaluating Growth And Survival, Matthew T. Devine

Masters Theses

River herring, collectively alewives (Alosa pseudoharengus) and blueback herring (A. aestivalis), have experienced substantial population declines over the past five decades due in large part to overfishing, combined with other sources of mortality, and disrupted access to critical freshwater spawning habitats. Anadromous river herring populations are currently assessed by counting adults in rivers during upstream spawning migrations, but no field-based assessment methods exist for estimating juvenile densities in freshwater nursery habitats. Counts of 4-year-old migrating adults are variable and prevent understanding about how mortality acts on different life stages prior to returning to spawn (e.g., juveniles …


Modelling Bird Migration With Motus Data And Bayesian State-Space Models, Justin Baldwin Oct 2017

Modelling Bird Migration With Motus Data And Bayesian State-Space Models, Justin Baldwin

Masters Theses

Bird migration is a poorly-known yet important phenomenon, as understanding movement patterns of birds can inform conservation strategies and public health policy for animal-borne diseases. Recent advances in wildlife tracking technology, in particular the Motus system, have allowed researchers to track even small flying birds and insects with radio transmitters that weigh fractions of a gram. This system relies on a community-based distributed sensor network that detects tagged animals as they move through the detection nodes on journeys that range from small local movements to intercontinental migrations. The quantity of data generated by the Motus system is unprecedented, is on …


On The Estimation Of Penetrance In The Presence Of Competing Risks With Family Data, Daniel Prawira Oct 2017

On The Estimation Of Penetrance In The Presence Of Competing Risks With Family Data, Daniel Prawira

Electronic Thesis and Dissertation Repository

In family studies, we are interested in estimating the penetrance function of the event of interest in the presence of competing risks. Failure to account for competing risks may lead to bias in the estimation of the penetrance function. In this thesis, three statistical challenges are addressed: clustering, missing data, and competing risks. We proposed the cause-specific model with shared frailty and ascertainment correction to account for clustering and competing risks along with ascertainment of families into study. Multiple imputation is used to account for missing data. The simulation study showed good performance of our proposed model in estimating the …


Models And Policies Of Port Carbon Emission Reduction: A Case Study Of The Port Of Dalian, Jiaqiong Zhao Aug 2017

Models And Policies Of Port Carbon Emission Reduction: A Case Study Of The Port Of Dalian, Jiaqiong Zhao

World Maritime University Dissertations

No abstract provided.


Annuity Product Valuation And Risk Measurement Under Correlated Financial And Longevity Risks, Soohong Park Aug 2017

Annuity Product Valuation And Risk Measurement Under Correlated Financial And Longevity Risks, Soohong Park

Electronic Thesis and Dissertation Repository

Longevity risk is a non-diversifiable risk and regarded as a pressing socio-economic challenge of the century. Its accurate assessment and quantification is therefore critical to enable pension-fund companies provide sustainable old-age security and maintain a resilient global insurance market. Fluctuations and a decreasing trend in mortality rates, which give rise to longevity risk, as well as the uncertainty in interest-rate dynamics constitute the two fundamental determinants in pricing and risk management of longevity-dependent products. We also note that historical data reveal some evidence of strong correlation between mortality and interest rates and must be taken into account when modelling their …


Examination And Comparison Of The Performance Of Common Non-Parametric And Robust Regression Models, Gregory F. Malek Aug 2017

Examination And Comparison Of The Performance Of Common Non-Parametric And Robust Regression Models, Gregory F. Malek

Electronic Theses and Dissertations

ABSTRACT

Examination and Comparison of the Performance of Common Non-Parametric and Robust Regression Models

By

Gregory Frank Malek

Stephen F. Austin State University, Masters in Statistics Program,

Nacogdoches, Texas, U.S.A.

g_m_2002@live.com

This work investigated common alternatives to the least-squares regression method in the presence of non-normally distributed errors. An initial literature review identified a variety of alternative methods, including Theil Regression, Wilcoxon Regression, Iteratively Re-Weighted Least Squares, Bounded-Influence Regression, and Bootstrapping methods. These methods were evaluated using a simple simulated example data set, as well as various real data sets, including math proficiency data, Belgian telephone call data, and faculty …


Using Mountain Snowpack To Predict Summer Water Availability In Semiarid Mountain Watersheds, Rebecca Dawn Garst Aug 2017

Using Mountain Snowpack To Predict Summer Water Availability In Semiarid Mountain Watersheds, Rebecca Dawn Garst

Boise State University Theses and Dissertations

In the mountainous landscapes of the western United States, water resources are dominated by snowpack. As temperatures rise in spring and summer, the melting snow produces an increase in river flow levels. Reservoirs are used during this increase to retain surplus water, which is released to supplement growing season water supply once the peak flows decrease to below water demands. Once there is no longer surplus natural flow of water, the water accounting changes – referred to as the day of allocation (DOA), and water previously retained within the reservoir is used to supplement the lower flow levels. The amount …


Data Analysis Methods Using Persistence Diagrams, Andrew Marchese Aug 2017

Data Analysis Methods Using Persistence Diagrams, Andrew Marchese

Doctoral Dissertations

In recent years, persistent homology techniques have been used to study data and dynamical systems. Using these techniques, information about the shape and geometry of the data and systems leads to important information regarding the periodicity, bistability, and chaos of the underlying systems. In this thesis, we study all aspects of the application of persistent homology to data analysis. In particular, we introduce a new distance on the space of persistence diagrams, and show that it is useful in detecting changes in geometry and topology, which is essential for the supervised learning problem. Moreover, we introduce a clustering framework directly …


Classification With Large Sparse Datasets: Convergence Analysis And Scalable Algorithms, Xiang Li Jul 2017

Classification With Large Sparse Datasets: Convergence Analysis And Scalable Algorithms, Xiang Li

Electronic Thesis and Dissertation Repository

Large and sparse datasets, such as user ratings over a large collection of items, are common in the big data era. Many applications need to classify the users or items based on the high-dimensional and sparse data vectors, e.g., to predict the profitability of a product or the age group of a user, etc. Linear classifiers are popular choices for classifying such datasets because of their efficiency. In order to classify the large sparse data more effectively, the following important questions need to be answered.

1. Sparse data and convergence behavior. How different properties of a dataset, such as …


Visualizing Lab And Phenotype Associations Using Phewas And Electronic Health Records, Brenda Emerson, Miriam Goldman, Sahiti Kolli Jul 2017

Visualizing Lab And Phenotype Associations Using Phewas And Electronic Health Records, Brenda Emerson, Miriam Goldman, Sahiti Kolli

Honors Projects

As the digitization of patient health records is becoming more common, we are given a great opportunity to analyze these records and hopefully make discoveries about diseases or medicines. Being given large datasets of Electronic Health Records, I and two other students decided to look for novel phenotype associations with mean lab values, look to see whether the presence of a lab had associations with a phenotype, and create an interactive application to visual the associations between labs and phenotypes.


Information Metrics For Predictive Modeling And Machine Learning, Kostantinos Gourgoulias Jul 2017

Information Metrics For Predictive Modeling And Machine Learning, Kostantinos Gourgoulias

Doctoral Dissertations

The ever-increasing complexity of the models used in predictive modeling and data science and their use for prediction and inference has made the development of tools for uncertainty quantification and model selection especially important. In this work, we seek to understand the various trade-offs associated with the simulation of stochastic systems. Some trade-offs are computational, e.g., execution time of an algorithm versus accuracy of simulation. Others are analytical: whether or not we are able to find tractable substitutes for quantities of interest, e.g., distributions, ergodic averages, etc. The first two chapters of this thesis deal with the study of the …


Statistical Methods For High Dimensional Data Arising From Large Epidemiological Studies, Hui Xu Jul 2017

Statistical Methods For High Dimensional Data Arising From Large Epidemiological Studies, Hui Xu

Doctoral Dissertations

In this thesis, we propose statistical models for addressing commonly encountered data types and study designs in large epidemiologic investigations aimed at understanding the molecular basis of complex disorders. The motivating applications come from diverse disease areas in Women's Health, including the study of type II diabetes in the Women's Health Initiative (WHI), invasive breast cancer in the Nurses' Health Study and the study of the metabolomic underpinnings of cardiovascular disease in the WHI. We have also put significant effort into making the implementation of the proposed methods accessible through freely available, user-friendly software packages in R. The first chapter …


Factor Based Statistical Arbitrage In The U.S. Equity Market With A Model Breakdown Detection Process, Seoungbyung Park Jul 2017

Factor Based Statistical Arbitrage In The U.S. Equity Market With A Model Breakdown Detection Process, Seoungbyung Park

Master's Theses (2009 -)

Many researchers have studied different strategies of statistical arbitrage to provide a steady stream of returns that are unrelated to the market condition. Among different strategies, factor-based mean reverting strategies have been popular and covered by many. This thesis aims to add value by evaluating the generalized pairs trading strategy and suggest enhancements to improve out-of-sample performance. The enhanced strategy generated the daily Sharpe ratio of 6.07% in the out-of-sample period from January 2013 through October 2016 with the correlation of -.03 versus S&P 500. During the same period, S&P 500 generated the Sharpe ratio of 6.03%. This thesis is …


Gridiron-Gurus Final Report: Fantasy Football Performance Prediction, Kyle Tanemura, Michael Li, Erica Dorn, Ryan Mckinney Jun 2017

Gridiron-Gurus Final Report: Fantasy Football Performance Prediction, Kyle Tanemura, Michael Li, Erica Dorn, Ryan Mckinney

Computer Science and Software Engineering

Gridiron Gurus is a desktop application that allows for the creation of custom AI profiles to help advise and compete against in a Fantasy Football setting. Our AI are capable of performing statistical prediction of players on both a season long and week to week basis giving them the ability to both draft and manage a fantasy football team throughout a season.


Mortgage Transition Model Based On Loanperformance Data, Shuyao Yang May 2017

Mortgage Transition Model Based On Loanperformance Data, Shuyao Yang

Arts & Sciences Electronic Theses and Dissertations

The unexpected increase in loan default on the mortgage market is widely considered to be one of the main cause behind the economic crisis. To provide some insight on loan delinquency and default, I analyze the mortgage performance data from Fannie Mae website and investigate how economic factors and individual loan and borrower information affect the events of default and prepaid. Various delinquency status including default and prepaid are treated as discrete states of a Markov chain. One-step transition probabilities are estimated via multinomial logistic models. We find that in general current loan-to-value ratio, credit score, unemployment rate, and interest …


Multidataset Independent Subspace Analysis: A Framework For Analysis Of Multimodal, Multi-Subject Brain Imaging Data, Rogers F. Silva May 2017

Multidataset Independent Subspace Analysis: A Framework For Analysis Of Multimodal, Multi-Subject Brain Imaging Data, Rogers F. Silva

Electrical and Computer Engineering ETDs

Mental illnesses are serious disorders of the brain that have devastating effects on individuals and society. In addition to their disabling and impairing effects, mental illnesses have deep social and economical implications, accounting for an estimated loss of 12 billion working days and a care cost surge to $6 trillion a year by 2030. For diseases such as depression and anxiety, enhancing preventive programs and treatment accessibility, in combination with accurate early diagnosis and personalized treatments, are projected to result in a four-fold return on every dollar invested, a strategy that can drastically help curtail those losses. Notably, within the …


On Post-Selection Confidence Intervals In Linear Regression, Xinwei Zhang May 2017

On Post-Selection Confidence Intervals In Linear Regression, Xinwei Zhang

Arts & Sciences Electronic Theses and Dissertations

The general goal of this thesis is to investigate and examine some issues about post-selection inference which arises from the setting where statistical inference is carried out after a datadriven model selection step. In this setting, the classical inference theory which requires a fixed priori model becomes invalid since the selected model is a result of random event. Hence, a common practice in applied research which ignores the model selection and builds up confidence interval will result in misleading or even false conclusion. In this thesis, specifically, we first discusses some examples to show how the classical inference theory loses …


Comparison Of Survival Curves Between Cox Proportional Hazards, Random Forests, And Conditional Inference Forests In Survival Analysis, Brandon Weathers May 2017

Comparison Of Survival Curves Between Cox Proportional Hazards, Random Forests, And Conditional Inference Forests In Survival Analysis, Brandon Weathers

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Survival analysis methods are a mainstay of the biomedical fields but are finding increasing use in other disciplines including finance and engineering. A widely used tool in survival analysis is the Cox proportional hazards regression model. For this model, all the predicted survivor curves have the same basic shape, which may not be a good approximation to reality. In contrast the Random Survival Forests does not make the proportional hazards assumption and has the flexibility to model survivor curves that are of quite different shapes for different groups of subjects. We applied both techniques to a number of publicly available …


A Comparison Of Statistical Methods Relating Pairwise Distance To A Binary Subject-Level Covariate, Rachael Stone May 2017

A Comparison Of Statistical Methods Relating Pairwise Distance To A Binary Subject-Level Covariate, Rachael Stone

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

A community ecologist provided a motivating data set involving a certain animal species with two behavior groups, along with a pairwise genetic distance matrix among individuals. Many community ecologists have analyzed similar data sets with a method known as the Hopkins method, testing for an association between the subject-level covariate (behavior group) and the pairwise distance. This community ecologist wanted to know if they used the Hopkins method, would their results be meaningful? Their question inspired this thesis work, where a different data set was used for confidentiality reasons. Multiple methods (Hopkins method, ADONIS, ANOSIM, and Distance Regression) were used …


A Bayesian Variable Selection Method With Applications To Spatial Data, Xiahan Tang May 2017

A Bayesian Variable Selection Method With Applications To Spatial Data, Xiahan Tang

Graduate Theses and Dissertations

This thesis first describes the general idea behind Bayes Inference, various sampling methods based on Bayes theorem and many examples. Then a Bayes approach to model selection, called Stochastic Search Variable Selection (SSVS) is discussed. It was originally proposed by George and McCulloch (1993). In a normal regression model where the number of covariates is large, only a small subset tend to be significant most of the times. This Bayes procedure specifies a mixture prior for each of the unknown regression coefficient, the mixture prior was originally proposed by Geweke (1996). This mixture prior will be updated as data becomes …


Statistical Methods For Two Problems In Cancer Research: Analysis Of Rna-Seq Data From Archival Samples And Characterization Of Onset Of Multiple Primary Cancers, Jialu Li May 2017

Statistical Methods For Two Problems In Cancer Research: Analysis Of Rna-Seq Data From Archival Samples And Characterization Of Onset Of Multiple Primary Cancers, Jialu Li

Dissertations & Theses (Open Access)

My dissertation is focused on quantitative methodology development and application for two important topics in translational and clinical cancer research.

The first topic was motivated by the challenge of applying transcriptome sequencing (RNA-seq) to formalin-fixation and paraffin-embedding (FFPE) tumor samples for reliable diagnostic development. We designed a biospecimen study to directly compare gene expression results from different protocols to prepare libraries for RNA-seq from human breast cancer tissues, with randomization to fresh-frozen (FF) or FFPE conditions. To comprehensively evaluate the FFPE RNA-seq data quality for expression profiling, we developed multiple computational methods for assessment, such as the uniformity and continuity …


Performance Of Imputation Algorithms On Artificially Produced Missing At Random Data, Tobias O. Oketch May 2017

Performance Of Imputation Algorithms On Artificially Produced Missing At Random Data, Tobias O. Oketch

Electronic Theses and Dissertations

Missing data is one of the challenges we are facing today in modeling valid statistical models. It reduces the representativeness of the data samples. Hence, population estimates, and model parameters estimated from such data are likely to be biased.

However, the missing data problem is an area under study, and alternative better statistical procedures have been presented to mitigate its shortcomings. In this paper, we review causes of missing data, and various methods of handling missing data. Our main focus is evaluating various multiple imputation (MI) methods from the multiple imputation of chained equation (MICE) package in the statistical software …