Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 20 of 20

Full-Text Articles in Entire DC Network

Bayesian Semi-Supervised Keyphrase Extraction And Jackknife Empirical Likelihood For Assessing Heterogeneity In Meta-Analysis, Guanshen Wang Dec 2020

Bayesian Semi-Supervised Keyphrase Extraction And Jackknife Empirical Likelihood For Assessing Heterogeneity In Meta-Analysis, Guanshen Wang

Statistical Science Theses and Dissertations

This dissertation investigates: (1) A Bayesian Semi-supervised Approach to Keyphrase Extraction with Only Positive and Unlabeled Data, (2) Jackknife Empirical Likelihood Confidence Intervals for Assessing Heterogeneity in Meta-analysis of Rare Binary Events.

In the big data era, people are blessed with a huge amount of information. However, the availability of information may also pose great challenges. One big challenge is how to extract useful yet succinct information in an automated fashion. As one of the first few efforts, keyphrase extraction methods summarize an article by identifying a list of keyphrases. Many existing keyphrase extraction methods focus on the unsupervised setting, …


Examining Multiple Imputation For Measurement Error Correction In Count Data With Excess Zeros, Shalima Zalsha Dec 2020

Examining Multiple Imputation For Measurement Error Correction In Count Data With Excess Zeros, Shalima Zalsha

Statistical Science Theses and Dissertations

Measurement error and missing data are two common problems in wildlife population surveys. These data are collected from the environment and may be missing or measured with error when the observer’s ability to see the animal is obscured. Methods such as video transects for estimating red snapper abundance and aerial surveys for estimating moose population sizes are highly affected by these problems since total abundance will be underestimated if missing/mismeasured counts are ignored. We shall refer to this problem as visibility bias; it occurs when the true counts are observed when visibility is high, partially observed when visibility is low …


Improved Statistical Methods For Time-Series And Lifetime Data, Xiaojie Zhu Dec 2020

Improved Statistical Methods For Time-Series And Lifetime Data, Xiaojie Zhu

Statistical Science Theses and Dissertations

In this dissertation, improved statistical methods for time-series and lifetime data are developed. First, an improved trend test for time series data is presented. Then, robust parametric estimation methods based on system lifetime data with known system signatures are developed.

In the first part of this dissertation, we consider a test for the monotonic trend in time series data proposed by Brillinger (1989). It has been shown that when there are highly correlated residuals or short record lengths, Brillinger’s test procedure tends to have significance level much higher than the nominal level. This could be related to the discrepancy between …


Biennial And Low-Frequency Components Of El Niño/Southern Oscillation, James Michael Ryan Aug 2020

Biennial And Low-Frequency Components Of El Niño/Southern Oscillation, James Michael Ryan

Theses and Dissertations

El Niño/Southern Oscillation (ENSO) is a coupled oscillation of sea surface temperatures (SSTs), winds, and air pressure in the eastern and central tropical Pacific, that repeats with quasi-regularity, every 2–7 years. Although the ENSO’s spectral peak is found at a 4–7-yr period, composite El Niño events, taken as the 84 months before and after the peak of each El Niño, show that the length of each event, and often the following La Niña if there is one, usually falls within a quasi-biennial (QB) range of around 18–42 months. We argue that the biennial range of ENSO events stems from the …


Bayesian Topological Machine Learning, Christopher A. Oballe Aug 2020

Bayesian Topological Machine Learning, Christopher A. Oballe

Doctoral Dissertations

Topological data analysis encompasses a broad set of ideas and techniques that address 1) how to rigorously define and summarize the shape of data, and 2) use these constructs for inference. This dissertation addresses the second problem by developing new inferential tools for topological data analysis and applying them to solve real-world data problems. First, a Bayesian framework to approximate probability distributions of persistence diagrams is established. The key insight underpinning this framework is that persistence diagrams may be viewed as Poisson point processes with prior intensities. With this assumption in hand, one may compute posterior intensities by adopting techniques …


Southwest Pacific Tropical Cyclone Frequency And Intensity Related To Observed And Modeled Geophysical And Aerosol Variables, Rupsa Bhowmick Jul 2020

Southwest Pacific Tropical Cyclone Frequency And Intensity Related To Observed And Modeled Geophysical And Aerosol Variables, Rupsa Bhowmick

LSU Doctoral Dissertations

The dissertation focuses on western region of Southwest Pacific Ocean (SWPO)

basin (135E - 180, and 5S - 35S) tropical cyclone (TC) climatology using observed

and modeled data. The classification-based machine learning approach

identifies the synoptic geophysical and aerosol environment favorable or unfavorable

for TC intensification and intensity change prior to landfall incorporating

observational and satellite data. A multiple poisson regression model with varying

temporal monthly lags was used to build a relationship between the number of

monthly TC days with basin wide average dust aerosol optical depth (AOD), sea

surface temperature (SST), and upper ocean temperature (UOT). This idea …


Causal Inference And Prediction On Observational Data With Survival Outcomes, Xiaofei Chen Jul 2020

Causal Inference And Prediction On Observational Data With Survival Outcomes, Xiaofei Chen

Statistical Science Theses and Dissertations

Infants with hypoplastic left heart syndrome require an initial Norwood operation, followed some months later by a stage 2 palliation (S2P). The timing of S2P is critical for the operation’s success and the infant’s survival, but the optimal timing, if one exists, is unknown. We attempt to estimate the optimal timing of S2P by analyzing data from the Single Ventricle Reconstruction Trial (SVRT), which randomized patients between two different types of Norwood procedure. In the SVRT, the timing of the S2P was chosen by the medical team; thus with respect to this exposure, the trial constitutes an observational study, and …


Bayesian Reliability Analysis For Optical Media Using Accelerated Degradation Test Data, Kun Bu Jun 2020

Bayesian Reliability Analysis For Optical Media Using Accelerated Degradation Test Data, Kun Bu

USF Tampa Graduate Theses and Dissertations

ISO (the International Organization for Standardization) 10995:2011 is the inter-national standard providing guidelines for assessing the reliability and service life of optical media, which is designed to be highly reliable and possesses a long lifetime. A well-known challenge of reliability analysis for highly reliable devices is that it is hard to obtain sufficient failure data under their normal use conditions. Accelerated degradation tests (ADTs) are commonly used to quickly obtain physical degradation data under elevated stress conditions, which are then extrapolated to predict reliability under the normal use condition. This standard achieves the estimation of the lifetime of recordable media, …


Research In Short Term Actuarial Modeling, Elijah Howells Jun 2020

Research In Short Term Actuarial Modeling, Elijah Howells

Electronic Theses, Projects, and Dissertations

This paper covers mathematical methods used to conduct actuarial analysis in the short term, such as policy deductible analysis, maximum covered loss analysis, and mixtures of distributions. Assessment of a loss variable's distribution under the effect of a policy deductible, as well as one with an implemented maximum covered loss, and under both a policy deductible and maximum covered loss will also be covered. The derivation, meaning, and use of cost per loss and cost per payment will be discussed, as will those of an aggregate sum distribution, stop loss policy, and maximum likelihood estimation. For each topic, special cases …


A Study Of Cusum Statistics On Bitcoin Transactions, Ivan Perez May 2020

A Study Of Cusum Statistics On Bitcoin Transactions, Ivan Perez

Theses and Dissertations

In this thesis, our objective is to study the relationship between transaction price and volume in the BTC/USD Coinbase exchange. In the second chapter, we develop a consecutive CUSUM algorithm to detect instantaneous changes in the arrival rate of market orders. We begin by estimating a baseline rate using the assumption of a local time-homogeneous Poisson process. Our observations lead us to reject the plausibility of a time-homogeneous Poisson model on a more global scale by using a chi squared test. We thus proceed to use CUSUM-based alarms to detect consecutive upward and downward changes in the arrival rate of …


Evaluation Of The Utility Of Informative Priors In Bayesian Structural Equation Modeling With Small Samples, Hao Ma May 2020

Evaluation Of The Utility Of Informative Priors In Bayesian Structural Equation Modeling With Small Samples, Hao Ma

Education Policy and Leadership Theses and Dissertations

The estimation of parameters in structural equation modeling (SEM) has been primarily based on the maximum likelihood estimator (MLE) and relies on large sample asymptotic theory. Consequently, the results of the SEM analyses with small samples may not be as satisfactory as expected. In contrast, informative priors typically do not require a large sample, and they may be helpful for improving the quality of estimates in the SEM models with small samples. However, the role of informative priors in the Bayesian SEM has not been thoroughly studied to date. Given the limited body of evidence, specifying effective informative priors remains …


Statistical Models And Analysis Of Univariate And Multivariate Degradation Data, Lochana Palayangoda May 2020

Statistical Models And Analysis Of Univariate And Multivariate Degradation Data, Lochana Palayangoda

Statistical Science Theses and Dissertations

For degradation data in reliability analysis, estimation of the first-passage time (FPT) distribution to a threshold provides valuable information on reliability characteristics. Recently, Balakrishnan and Qin (2019; Applied Stochastic Models in Business and Industry, 35:571-590) studied a nonparametric method to approximate the FPT distribution of such degradation processes if the underlying process type is unknown. In this thesis, we propose improved techniques based on saddlepoint approximation, which enhance upon their suggested methods. Numerical examples and Monte Carlo simulation studies are used to illustrate the advantages of the proposed techniques. Limitations of the improved techniques are discussed and some possible solutions …


Sensitivity Analysis For Incomplete Data And Causal Inference, Heng Chen May 2020

Sensitivity Analysis For Incomplete Data And Causal Inference, Heng Chen

Statistical Science Theses and Dissertations

In this dissertation, we explore sensitivity analyses under three different types of incomplete data problems, including missing outcomes, missing outcomes and missing predictors, potential outcomes in \emph{Rubin causal model (RCM)}. The first sensitivity analysis is conducted for the \emph{missing completely at random (MCAR)} assumption in frequentist inference; the second one is conducted for the \emph{missing at random (MAR)} assumption in likelihood inference; the third one is conducted for one novel assumption, the ``sixth assumption'' proposed for the robustness of instrumental variable estimand in causal inference.


A Novel Approach To Updating Municipal Tax Parcel Impervious Surface Calculations, Patrick D. Muradaz May 2020

A Novel Approach To Updating Municipal Tax Parcel Impervious Surface Calculations, Patrick D. Muradaz

Senior Honors Projects, 2020-current

Accurate impervious surface calculations are important to many municipalities due to the high volumes of surface rainwater runoff caused by high impervious surface density. Municipalities must deal with this runoff through the establishment and maintenance of drainage facilities. To help offset the added cost of these facilities, many municipalities impose taxes and fees on privately owned impervious surfaces such as homes, driveways, and patios. Currently, in order for a city like Harrisonburg to calculate tax parcel impervious surface density, aerial images must be manually digitized or mapped using computer-based classification techniques using predictive models. These methods of impervious surface calculations …


Boom Or Bust: Examining The Relationship Between High School Recruiting Rankings And The Nfl Draft, Nicholas E. Tice Apr 2020

Boom Or Bust: Examining The Relationship Between High School Recruiting Rankings And The Nfl Draft, Nicholas E. Tice

Senior Theses

The goal of this thesis is to model the probability of a high school football player’s chance of being drafted based on information taken from their recruiting profile. The response variable is binary and defined as drafted (1) or undrafted (0). The independent variables were collected by scraping data from the recruiting websites including height, weight, position, hometown, recruiting grade and other socioeconomic factors based on the player’s high school. 247Sports and ESPN were the two recruiting services used and compared in this study. Because of the binary nature of the dependent variable, logistic regression and decision trees were chosen …


An Actuarial Approach To Personal Injury Protection Severity, Jason Colgrove Mar 2020

An Actuarial Approach To Personal Injury Protection Severity, Jason Colgrove

Undergraduate Honors Theses

Insurance companies examine the risk of financial losses for their policyholders as a way to accurately price insurance policies. Within the automobile insurance sector, the frequency of crashes and the associated liabilities started to increase in late 2013 when it had been on the decline for close to a decade. The purpose of this research focuses on the possible correlated variables that could lead to a better understanding of this change. To embark on this task, we teamed up with the Society of Actuaries, Casualty Actuarial Society, and the American Property Casualty Insurance Association to obtain data regarding frequency, severity, …


Gradient Boosting For Survival Analysis With Applications In Oncology, Nam Phuong Nguyen Jan 2020

Gradient Boosting For Survival Analysis With Applications In Oncology, Nam Phuong Nguyen

USF Tampa Graduate Theses and Dissertations

Cancer is one of the most deadly diseases that the world has been fighting against over decades. An enormous number of research has been conducted, via a wide scale of approaches, raging from genetic analysis to mathematical modeling. Survival analysis is a well-performed methodology frequently used to estimate the survival probability of a patient. Although there has been a large number of methods for survival analysis, efficient exploration of a high-dimensional feature space has been challenging due to its computational cost and complexity. This thesis adapts the component-wise gradient boosting algorithms for cancer survival analysis, and also proposes a new …


Bayesian Approach To Finding The Most Likely Circuit Structure, Shannon Harms Jan 2020

Bayesian Approach To Finding The Most Likely Circuit Structure, Shannon Harms

Graduate Research Theses & Dissertations

Systems, and their reliabilities, depend on the reliabilities of the components that theyare composed of, and in this paper we want to nd the system structure that is the most likely given observed data. Bayesian methods were utilized in order to discover the posterior means, or observed reliabilities, of both the components and the systems. Assuming the serial and parallel system structures have independent components, we calculated system reliabilities based on observed component reliabilities by using the multiplication and addi- tion probability rules. We are then able to expand upon the numerical comparison method through a maximum likelihood analysis that …


An Examination Of Covid-19 Statistical Modeling, Shane Vaughan Jan 2020

An Examination Of Covid-19 Statistical Modeling, Shane Vaughan

Williams Honors College, Honors Research Projects

The 2019 novel coronavirus, also known as COVID-19, is an infectious disease which was first reported in late 2019 and soon spread to become a global pandemic, prompting major action from world governments. Soon after, many institutions began attempts to analyze and predict the spread and severity of the disease via statistical modeling. Some information is not available for public consumption; however, a number of institutions have published the results of their analyses and some have made public repositories of the code used to build the models. This research paper attempts use these and other resources to examine the modeling …


Deriving Statistical Inference From The Application Of Artificial Neural Networks To Clinical Metabolomics Data, Kevin M. Mendez Jan 2020

Deriving Statistical Inference From The Application Of Artificial Neural Networks To Clinical Metabolomics Data, Kevin M. Mendez

Theses: Doctorates and Masters

Metabolomics data are complex with a high degree of multicollinearity. As such, multivariate linear projection methods, such as partial least squares discriminant analysis (PLS-DA) have become standard. Non-linear projections methods, typified by Artificial Neural Networks (ANNs) may be more appropriate to model potential nonlinear latent covariance; however, they are not widely used due to difficulty in deriving statistical inference, and thus biological interpretation. Herein, we illustrate the utility of ANNs for clinical metabolomics using publicly available data sets and develop an open framework for deriving and visualising statistical inference from ANNs equivalent to standard PLS-DA methods.