Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability

Theses/Dissertations

2020

Institution
Keyword
Publication

Articles 31 - 60 of 258

Full-Text Articles in Physical Sciences and Mathematics

A Bayesian Network Analysis Of The Human Exposure To Escherichia Coli Bacteria In The Environment, Brandon G. De Flon Nov 2020

A Bayesian Network Analysis Of The Human Exposure To Escherichia Coli Bacteria In The Environment, Brandon G. De Flon

Mathematics & Statistics ETDs

Diarrhea is a leading cause of death worldwide because a lack thereof in household sanitization exposes humans to high concentrations of pathogenic Escherichia coli. In 2016, the University of New Mexico’s Nepal Study Center collected cross-sectional survey data using proportional random sampling on three communities in Western Nepal. Structural and parameter learning estimation and approximate inference of Bayesian networks studied diarrheagenic E. coli exposure while incorporating participation in sanitary behaviors, access to sanitary built-in environments, and other human characteristics. Of the reported sickness, hand washing resulted in a 20 percent decrease, water treatment 8 percent, and both 28 percent. Of …


From Wave Propagation To Spin Dynamics: Mathematical And Computational Aspects, Oleksii Beznosov Nov 2020

From Wave Propagation To Spin Dynamics: Mathematical And Computational Aspects, Oleksii Beznosov

Mathematics & Statistics ETDs

In this work we concentrate on two separate topics which pose certain numerical challenges. The first topic is the spin dynamics of electrons in high-energy circular accelerators. We introduce a stochastic differential equation framework to study spin depolarization and spin equilibrium. This framework allows the mathematical study of known equations and new equations modelling the spin distribution of an electron bunch. A spin distribution is governed by a so-called Bloch equation, which is a linear Fokker-Planck type PDE, in general posed in six dimensions. We propose three approaches to approximate solutions, using analytical and modern numerical techniques. We also present …


Multimodal Data Fusion And Attack Detection In Recommender Systems, Mehmet Aktukmak Nov 2020

Multimodal Data Fusion And Attack Detection In Recommender Systems, Mehmet Aktukmak

USF Tampa Graduate Theses and Dissertations

The commercial platforms that use recommender systems can collect relevant information to produce useful recommendations to the platform users. However, these sources usually contain missing values, imbalanced and heterogeneous data, and noisy observations. Such characteristics render the process of exploiting the information nontrivial, as one should carefully address them during the data fusion process. In addition to the degenerative characteristics, some entries can be fake, i.e., they can be the outcomes of malicious intents to manipulate the system. These entries should be eliminated before incorporation to any recommendation task. Detecting such malicious attacks quickly and accurately and then mitigating them …


Numerical Study Of Gap Distributions In Determinantal Point Process On Low Dimensional Spheres: L-Ensemble Of O(N) Model Type For N = 2 And N = 3, Xiankui Yang Oct 2020

Numerical Study Of Gap Distributions In Determinantal Point Process On Low Dimensional Spheres: L-Ensemble Of O(N) Model Type For N = 2 And N = 3, Xiankui Yang

USF Tampa Graduate Theses and Dissertations

Poisson point process is the most well-known point process with many applications. Unlike Poisson point process, which is the random set of non-intersecting points, determinantal point process refers to certain class of point processes where the points tend to interact with each other. The interaction often leads to more uniformly distributed points compared to those in Poisson point process.

In this article, we study the gap distribution of certain class of determinantal point process, L-ensemble of O(n) model type, and compare the distribution with the ones from the other known determinantal point process that appears in random matrices. Our numerical …


Task Interrupted By A Poisson Process, Jarrett Christopher Nantais Oct 2020

Task Interrupted By A Poisson Process, Jarrett Christopher Nantais

Major Papers

We consider a task which has a completion time T (if not interrupted), which is a random variable with probability density function (pdf) f(t), t>0. Before it is complete, the task may be interrupted by a Poisson process with rate lambda. If that happens, then the task must begin again, with the same completion time random variable T, but with a potentially different realization. These interruptions can reoccur, until eventually the task is finished, with a total time of W. In this paper, we will find the Laplace Transform of W in several special cases.


Categorical And Fuzzy Ensemble-Based Algorithms For Cluster Analysis, Bridget Nicole Manning Oct 2020

Categorical And Fuzzy Ensemble-Based Algorithms For Cluster Analysis, Bridget Nicole Manning

Theses and Dissertations

This dissertation focuses on improving multivariate methods of cluster analysis. In Chapter 3 we discuss methods relevant to the categorical clustering of tertiary data while Chapter 4 considers the clustering of quantitative data using ensemble algorithms. Lastly, in Chapter 5, future research plans are discussed to investigate the clustering of spatial binary data.

Cluster analysis is an unsupervised methodology whose results may be influenced by the types of variables recorded on observations. When dealing with the clustering of categorical data, solutions produced may not accurately reflect the structure of the process that generated them. Increased variability within the latent structure …


Incorporation And Measurement Of Uncertainty In Clustered And Spatial Data, Yuan Hong Oct 2020

Incorporation And Measurement Of Uncertainty In Clustered And Spatial Data, Yuan Hong

Theses and Dissertations

Analyzing population representative datasets for local estimation and predictions over time is important for monitoring related public health issues, however, there are many statistical challenges associated with such analyses. Mixed effect models are one of the common options which can incorporate time and spatial effect in the model and related inference is well established.

In the first part of this dissertation, to estimate area-level prevalence using individuallevel data, small area estimation (SAE) with post-stratified mixed effect models were used where sampling weights were also incorporated into it. However, if poststratification which requires more computation effort can improve estimation accuracy is …


Estimation And Inference Under Model Uncertainty, Yizheng Wei Oct 2020

Estimation And Inference Under Model Uncertainty, Yizheng Wei

Theses and Dissertations

Chapter 1 of this dissertation proposes a consistent and locally efficient estimator to estimate the model parameters for a logistic mixed effect model with random slopes. Our approach relaxes two typical assumptions: the random effects being normally distributed, and the covariates and random effects being independent of each other. Adhering to these assumptions is particularly difficult in health studies where in many cases we have limited resources to design experiments and gather data in long-term studies, while new findings from other fields might emerge, suggesting the violation of such assumptions. So it is crucial if we could have an estimator …


Role Of Influence In Complex Networks, Nur Dean Sep 2020

Role Of Influence In Complex Networks, Nur Dean

Dissertations, Theses, and Capstone Projects

Game theory is a wide ranging research area; that has attracted researchers from various fields. Scientists have been using game theory to understand the evolution of cooperation in complex networks. However, there is limited research that considers the structure and connectivity patterns in networks, which create heterogeneity among nodes. For example, due to the complex ways most networks are formed, it is common to have some highly “social” nodes, while others are highly isolated. This heterogeneity is measured through metrics referred to as “centrality” of nodes. Thus, the more “social” nodes tend to also have higher centrality.

In this thesis, …


Machine Learning Applications For Drug Repurposing, Hansaim Lim Sep 2020

Machine Learning Applications For Drug Repurposing, Hansaim Lim

Dissertations, Theses, and Capstone Projects

The cost of bringing a drug to market is astounding and the failure rate is intimidating. Drug discovery has been of limited success under the conventional reductionist model of one-drug-one-gene-one-disease paradigm, where a single disease-associated gene is identified and a molecular binder to the specific target is subsequently designed. Under the simplistic paradigm of drug discovery, a drug molecule is assumed to interact only with the intended on-target. However, small molecular drugs often interact with multiple targets, and those off-target interactions are not considered under the conventional paradigm. As a result, drug-induced side effects and adverse reactions are often neglected …


Comparison Of Longitudinal Changes In Resting State Functional Magnetic Resonance Imaging Between Alzheimer’S And Healthy Controls, Berk Can Yilmaz Aug 2020

Comparison Of Longitudinal Changes In Resting State Functional Magnetic Resonance Imaging Between Alzheimer’S And Healthy Controls, Berk Can Yilmaz

Theses

Resting State Functional Magnetic Resonance Imaging (rs-fMRI) is a technique that is widely used for analyzing brain function using different approaches and methods. This study involves rs-fMRI analysis of Blood Oxygenation Level Dependent (BOLD) signals acquired from Alzheimer’s disease (AD) Patients and Healthy Controls (HC). Each subject in the study had both functional and anatomical images with at least one rs-fMRI scan with their Anatomical (T1) scans. Previous rs-fMRI studies have demonstrated that AD shows differences in Amplitude of Low Frequency (<0.1 Hz) Fluctuations (ALFF), and Regional Homogeneity (ReHo) measures according to HCs.

The aim of the study is to investigate individual and group level differences using ReHo and mALFF related …


A Treatise Of Pd-Lgd Correlation Modelling, Wisdom S. Avusuglo Wsa Aug 2020

A Treatise Of Pd-Lgd Correlation Modelling, Wisdom S. Avusuglo Wsa

Electronic Thesis and Dissertation Repository

The provision in Paragraph 468 of Basel II Framework Document for calculating loss given default (LGD) requires that parameters used in Pillar I of Basel II capital estimations must be reflective of economic downturn conditions so that relevant risks are accounted for. This provision is based on the fact that the probability of default (PD) and LGD correlations are not captured in the proposed formula for estimating economic capital. To help quantify economic downturn LGD, the Basel Committee proposed establishing a functional relationship between long-run and downturn LGD.

To the best of our knowledge, the current proposed models that map …


Evaluation Of China Shipping Hub-And-Spoke Network Based On Herfindahl-Hirschmann Index (Hhi), Wenjin Sun Aug 2020

Evaluation Of China Shipping Hub-And-Spoke Network Based On Herfindahl-Hirschmann Index (Hhi), Wenjin Sun

World Maritime University Dissertations

No abstract provided.


Ranking Comments: An Entropy-Based Method With Word Embedding Clustering, Yuyang Zhang Aug 2020

Ranking Comments: An Entropy-Based Method With Word Embedding Clustering, Yuyang Zhang

Electronic Thesis and Dissertation Repository

Automatically ranking comments by their relevance plays an important role in text mining and text summarization area. In this thesis, firstly, we introduce a new text digitalization method: the bag of word clusters model. Unlike the traditional bag of words model that treats each word as an independent item, we group semantic-related words as clusters using pre-trained word2vec word embeddings and represent each comment as a distribution of word clusters. This method can extract both semantic and statistical information from texts. Next, we propose an unsupervised ranking algorithm that identifies relevant comments by their distance to the “ideal” comment. The …


Wavelet Coherence Analysis With An Application Of Brain Images, Yiqian Fang Aug 2020

Wavelet Coherence Analysis With An Application Of Brain Images, Yiqian Fang

Arts & Sciences Electronic Theses and Dissertations

Wavelet analysis has become an emerging method in a wide range of applications with non-stationary data. In this work, we apply wavelets to tackle the problem of estimating dynamic association in a collection of multivariate non-stationary time series. Coherence is a common metric for linear dependence across signals. However, it assumes static dependence and does not sufficiently model many biological processes with time-evolving dependence structures. We explore continuous wavelet analysis for modeling and estimating such dynamic dependence under the replicated multivariate time series settings. Wavelet transformation provides a decomposition of signals that localizes in both time and frequency domains, hence …


Renewable-Energy Resources, Economic Growth And Their Causal Link, Yiyang Chen Aug 2020

Renewable-Energy Resources, Economic Growth And Their Causal Link, Yiyang Chen

Electronic Thesis and Dissertation Repository

This thesis examines the presence and strength of predictive causal relationship between re-newable energy prices and economic growth. We look for evidence by investigating the cases of Norway, New Zealand, and Canada’s two provinces of Alberta and Ontario. The usual vectorautoregressive model (VAR) and its various improved versions still assume constant parametersover time. We devise a Markov-switching VAR (MS-VAR) model in order to accommodate the observed time-dependent causal relation changes. Our proposed modelling approach is induced by the hidden Markov model methodologies in terms of an online parameter estimationthrough recursive filtering. The parameters of the MS-VAR model are governed by …


Snow-Albedo Feedback In Northern Alaska: How Vegetation Influences Snowmelt, Lucas C. Reckhaus Aug 2020

Snow-Albedo Feedback In Northern Alaska: How Vegetation Influences Snowmelt, Lucas C. Reckhaus

Theses and Dissertations

This paper investigates how the snow-albedo feedback mechanism of the arctic is changing in response to rising climate temperatures. Specifically, the interplay of vegetation and snowmelt, and how these two variables can be correlated. This has the potential to refine climate modelling of the spring transition season. Research was conducted at the ecoregion scale in northern Alaska from 2000 to 2020. Each ecoregion is defined by distinct topographic and ecological conditions, allowing for meaningful contrast between the patterns of spring albedo transition across surface conditions and vegetation types. The five most northerly ecoregions of Alaska are chosen as they encompass …


The Influence Of Environmental Variables On The Height Growth Of Loblolly Pine (Pinus Taeda) In The Western Gulf, Osakpamwan Edo-Iyasere Aug 2020

The Influence Of Environmental Variables On The Height Growth Of Loblolly Pine (Pinus Taeda) In The Western Gulf, Osakpamwan Edo-Iyasere

Electronic Theses and Dissertations

Understanding the effects of environmental factors on stand growth is important in optimizing forest management plans. This study investigated the effects of soil and climate factors on the height growth (site index) of loblolly pine (Pinus Taeda L.) using data collected from permanent plots established in intensively-managed plantations across East Texas and Western Louisiana. The Chapman-Richards model was selected as the base model to describe the height-age relationships and important soil and climate variables were incorporated into the models as model parameter coefficient adjustors. Our results showed that the most important factors for predicting site index were nitrogen …


Classification-Based Method For Estimating Dynamic Treatment Regimes, Junwei Shen Aug 2020

Classification-Based Method For Estimating Dynamic Treatment Regimes, Junwei Shen

Electronic Thesis and Dissertation Repository

Dynamic treatment regimes are sequential decision rules dictating how to individualize treatments to patients based on evolving treatments and covariate history. In this thesis, we investigate two methods of estimating dynamic treatment regimes. The first method extends outcome weighted learning from two-treatments to multi-treatments and allows for negative treatment outcome. We show that under two different sets of assumptions, the Fisher consistency can be maintained. The second method estimates treatment rules by a neural classification tree. A weighted squared loss function is defined to approximate the indicator function to maintain the smoothness. A method of tree reconstruction and pruning is …


Cell Assembly Detection In Low Firing-Rate Spike Train Data, Phan Minh Duc Truong Aug 2020

Cell Assembly Detection In Low Firing-Rate Spike Train Data, Phan Minh Duc Truong

Mathematics Theses and Dissertations

Cell assemblies, defined as groups of neurons forming temporal spike coordination, are thought to be fundamental units supporting major cognitive functions. However, detecting cell assemblies is challenging since they can occur at a range of time scales and with a range of precisions, from synchronous spikes to co-variations in firing rate. In this dissertation, we use a recently published cell assembly detection (CAD) algorithm that is capable of detecting assemblies at a range of time scales and precisions. We first showed that the CAD method can be applied to sparser spike train data than what have previously been reported. This …


Machine-Learning-Based Prediction Of Sepsis Events From Vertical Clinical Trial Data: A Naïve Approach, Tyler Michael Gaddis Aug 2020

Machine-Learning-Based Prediction Of Sepsis Events From Vertical Clinical Trial Data: A Naïve Approach, Tyler Michael Gaddis

Theses and Dissertations

Sepsis is a potentially life-threatening condition characterized by a dysregulated, disproportionate immune response to infection by which the afflicted body attacks its own tissues, sometimes to the point of organ failure, and in the worst cases, death. According to the Centers for Disease Control and Prevention (CDC) Sepsis is reported to kill upwards of 270,000 Americans annually, though this figure may be greater given certain ambiguities in the current accepted diagnostic framework of the disease.

This study attempted to first establish an understanding of past definitions of sepsis, and to then recommend use of machine learning as integral in an …


A Novel Correction For The Adjusted Box-Pierce Test — New Risk Factors For Emergency Department Return Visits Within 72 Hours For Children With Respiratory Conditions — General Pediatric Model For Understanding And Predicting Prolonged Length Of Stay, Sidy Danioko Aug 2020

A Novel Correction For The Adjusted Box-Pierce Test — New Risk Factors For Emergency Department Return Visits Within 72 Hours For Children With Respiratory Conditions — General Pediatric Model For Understanding And Predicting Prolonged Length Of Stay, Sidy Danioko

Computational and Data Sciences (PhD) Dissertations

This thesis represents the results of three research projects that underline the breadth and depth of my interests.

Firstly, I devoted some efforts to the well-known Box-Pierce goodness-of-fit tests for time series models which has been an important research topic over the last few decades. All previously proposed tests are focused on changes of the test statistics. Instead, I adopted a different approach that takes the best performing test and modifying the rejection region. Thus, I developed a semiparametric correction of the Adjusted Box-Pierce test that attains the best I error rates for all sample sizes and lags and outperforms …


A Bayesian Markov Chain Monte Carlo Approach To Uncertainty Quantification, Matthew Isaac Aug 2020

A Bayesian Markov Chain Monte Carlo Approach To Uncertainty Quantification, Matthew Isaac

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Uncertainty quantification (UQ) is a framework used frequently in engineering analyses to understand how uncertainty in system inputs lead to uncertainty in the system output. An instability is observed in a UQ method proposed by Roy and Oberkampf and a Bayesian Markov Chain Monte Carlo approach to UQ is offered as an alternative. The Bayesian approach allows analysts to incorporate information from various available sources including observed measurements and expert opinion and to update the analysis and results as more information becomes available. An illustrative engineering example is provided as a platform to demonstrate the Bayesian UQ approach and to …


Estimating Distortion Risk Measures Under Truncated And Censored Data Scenarios, Sahadeb Upretee Aug 2020

Estimating Distortion Risk Measures Under Truncated And Censored Data Scenarios, Sahadeb Upretee

Theses and Dissertations

\begin{center}

ABSTRACT\\

\vspace{0.4in}

ESTIMATING DISTORTION RISK MEASURES UNDER TRUNCATED AND CENSORED DATA SCENARIOS

\end{center}

\doublespacing

\noindent

~In insurance data analytics and actuarial practice, a broad class of

risk measures -- {\em distortion risk measures\/} -- are used to capture

the riskiness of the distribution tail. Point and interval estimates of

the risk measures are then employed to price extreme events, to develop

reserves, to design risk transfer strategies, and to allocate capital.

When solving such problems, the main statistical challenge is to choose

an appropriate estimate of a risk measure and to assess its variability.

In this context, the empirical …


Statistical Methods For Resolving Intratumor Heterogeneity With Single-Cell Dna Sequencing, Alexander Davis Aug 2020

Statistical Methods For Resolving Intratumor Heterogeneity With Single-Cell Dna Sequencing, Alexander Davis

Dissertations & Theses (Open Access)

Tumor cells have heterogeneous genotypes, which drives progression and treatment resistance. Such genetic intratumor heterogeneity plays a role in the process of clonal evolution that underlies tumor progression and treatment resistance. Single-cell DNA sequencing is a promising experimental method for studying intratumor heterogeneity, but brings unique statistical challenges in interpreting the resulting data. Researchers lack methods to determine whether sufficiently many cells have been sampled from a tumor. In addition, there are no proven computational methods for determining the ploidy of a cell, a necessary step in the determination of copy number. In this work, software for calculating probabilities from …


Analyzing The Fractal Dimension Of Various Musical Pieces, Nathan Clark Aug 2020

Analyzing The Fractal Dimension Of Various Musical Pieces, Nathan Clark

Industrial Engineering Undergraduate Honors Theses

One of the most common tools for evaluating data is regression. This technique, widely used by industrial engineers, explores linear relationships between predictors and the response. Each observation of the response is a fixed linear combination of the predictors with an added error element. The method is built on the assumption that this error is normally distributed across all observations and has a mean of zero. In some cases, it has been found that the inherent variation is not the result of a random variable, but is instead the result of self-symmetric properties of the observations. For data with these …


Marginal Methods And Software For Clustered Data With Cluster- And Group-Size Informativeness., Mary Elizabeth Gregg Aug 2020

Marginal Methods And Software For Clustered Data With Cluster- And Group-Size Informativeness., Mary Elizabeth Gregg

Electronic Theses and Dissertations

Clustered data result when observations have some natural organizational association. In such data, cluster size is defined as the number of observations belonging to a cluster. A phenomenon termed informative cluster size (ICS) occurs when observation outcomes vary in a systematic way related to the cluster size. An additional form of informativeness, termed informative within-cluster group size (IWCGS), arises when the distribution of group-defining categorical covariates within clusters similarly carries information related to outcomes. Standard methods for the marginal analysis of clustered data can produce biased estimates and inference when data have informativeness. A reweighting methodology has been developed that …


Linear Methods For Regression With Small Sample Sizes Relative To The Number Of Variables., Rajesh Sikder Aug 2020

Linear Methods For Regression With Small Sample Sizes Relative To The Number Of Variables., Rajesh Sikder

Electronic Theses and Dissertations

In data sets where there are a small number of observations but a large number of variables observed for each observation, ordinary least squares estimation cannot be used for regression models. There are many alternative including stepwise regression, penalized methods such as ridge regression and the LASSO, and methods based on derived inputs such as principal components regression and partial least squares regression. In this thesis, these five methods are described. K-fold cross validation is also discussed as a way for determining regularization parameters for each method. The performance of these methods in estimation and prediction is also examined through …


A Geochemical And Statistical Investigation Of The Big Four Springs Region In Southern Missouri, Jordan Jasso Vega Aug 2020

A Geochemical And Statistical Investigation Of The Big Four Springs Region In Southern Missouri, Jordan Jasso Vega

MSU Graduate Theses

The Big Four Springs region hosts four major first-order magnitude springs in southern Missouri and northern Arkansas. These springs are Big Spring (Carter County, MO), Greer Spring (Oregon County, MO), Mammoth Spring (Fulton County, AR), and Hodgson Mill Spring (Ozark County, MO). Based on historic dye traces and hydrogeological investigations, these springs drain an area of approximately 1500 square miles and collectively discharge an average of 780 million gallons of water per day. The rocks from youngest to oldest that are found in Big Four Springs region are the Cotter and Jefferson City Dolomite (Ordovician), Roubidoux Formation (Ordovician), Gasconade Dolomite …


Evaluating Particulate Matter 2.5 In The Yangtze River Delta, Muhammad Abdullah Aug 2020

Evaluating Particulate Matter 2.5 In The Yangtze River Delta, Muhammad Abdullah

MSU Graduate Theses

Particulate Matter 2.5 (PM2.5) is a growing concern in industrialized countries. In China, high concentrations of PM2.5 are causing devastating health and environmental effects for the people living there. Coal-burning for domestic and industrial purposes is the main culprit for decreasing air quality in China. The focus of this paper is on the Yangtze River Delta (YRD) located on the eastern coast of China. Hourly PM2.5 readings from March 2015 to June 2016 were obtained from 125 air quality monitoring stations (AQMS) in 23 cities in the YRD. In this study, PM2.5 readings were examined …