Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability

Theses/Dissertations

2019

Institution
Keyword
Publication

Articles 61 - 90 of 253

Full-Text Articles in Physical Sciences and Mathematics

Statistical Learning Of Biomedical Non-Stationary Signals And Quality Of Life Modeling, Mahdi Goudarzi Jul 2019

Statistical Learning Of Biomedical Non-Stationary Signals And Quality Of Life Modeling, Mahdi Goudarzi

USF Tampa Graduate Theses and Dissertations

Statistical learning is a set of tools for modeling and understanding complex datasets. It is a recently developed area in statistics and blends with parallel developments in computer science and, in particular, machine learning.

The classification of biomedical non-stationary signals such as Electroencephalogram (EEG) is always a challenging problem due to their complexity. The low spatial resolution on the scalp, curse of dimensionality, poor signal-to-noise ratio are disadvantages of working with biomedical signals. EEG signals are unstructured data which needs preprocessing steps to extract informative features which are measurable and predictive. In the first two chapters of this dissertation, EEG …


Allocative Poisson Factorization For Computational Social Science, Aaron Schein Jul 2019

Allocative Poisson Factorization For Computational Social Science, Aaron Schein

Doctoral Dissertations

Social science data often comes in the form of high-dimensional discrete data such as categorical survey responses, social interaction records, or text. These data sets exhibit high degrees of sparsity, missingness, overdispersion, and burstiness, all of which present challenges to traditional statistical modeling techniques. The framework of Poisson factorization (PF) has emerged in recent years as a natural way to model high-dimensional discrete data sets. This framework assumes that each observed count in a data set is a Poisson random variable $y ~ Pois(\mu)$ whose rate parameter $\mu$ is a function of shared model parameters. This thesis examines a specific …


Methods For Making Policy-Relevant Forecasts Of Infectious Disease Incidence, Stephen A. Lauer Jul 2019

Methods For Making Policy-Relevant Forecasts Of Infectious Disease Incidence, Stephen A. Lauer

Doctoral Dissertations

Infectious diseases place an enormous burden on the people of the developing world and their governments. When, where, and how to allocate resources in order to slow the spread of a virus or deal with the aftermath of an outbreak is often the responsibility of local public health officials. In this thesis, we develop statistical methods for forecasting future incidence of infectious diseases and estimating the effects of interventions designed to reduce future incidence, bearing in mind the needs and concerns of those public health officials. While most infectious disease forecasting models focus on short-term horizons (i.e. weeks or …


Statistical Analysis Of Interval-Censored Data Subject To Additional Complications, Qiang Zheng Jul 2019

Statistical Analysis Of Interval-Censored Data Subject To Additional Complications, Qiang Zheng

Theses and Dissertations

Survival analysis is an important branch of statistics that studies time to event data (or survival data), in which the response variable is time to a certain event of interest. The most prominent feature of survival data is that the response is not exactly observed due to limits of the study design or nature of the event of interest. Interval-censored data are a common type of survival data and occur frequently in real life studies where subjects are examined at periodical follow ups. The response time is usually not observed, but the status of the event of interest is known …


Estimation Problems For Pooled Data, Xichen Mou Jul 2019

Estimation Problems For Pooled Data, Xichen Mou

Theses and Dissertations

In epidemiological applications, individual specimens (e.g., blood, urine, etc.) are often pooled together to detect the presence of disease or to measure the concentration level of a specific biomarker. Due to the advantage of cost efficiency, pooled data are also seen in diverse areas such as genetics, animal ecology, and environmental science. With pooled data, individual observations are masked and new statistical methods are needed to estimate characteristics such as disease prevalence, the underlying density function of a biomarker, etc. We focus on three estimation problems for pooled data. Chapters 2 and 3 propose nonparametric estimators for the density function …


Multivariate Probit Models For Interval-Censored Failure Time Data, Yifan Zhang Jul 2019

Multivariate Probit Models For Interval-Censored Failure Time Data, Yifan Zhang

Theses and Dissertations

Survival analysis is an important branch of statistics that analyzes the time to event data. The events of interest can be death, disease occurrence, the failure of a machine part, etc.. One important feature of this type of data is censoring: information on time to event is not observed exactly due to loss to follow-up or non-occurrence of interested event before the trial ends. Censored data are commonly observed in clinical trials and epidemiological studies, since monitoring a person’s health over time after treatment is often required in medical or health studies. In this dissertation we focus on studying multivariate …


Extension Of Risk-Based Measure Of Time-Varying Prognostic Discrimination For Survival Models, Shujie Chen Jul 2019

Extension Of Risk-Based Measure Of Time-Varying Prognostic Discrimination For Survival Models, Shujie Chen

Theses and Dissertations

The Cox proportional hazards (PH) model and time dependent PH model are the most popular survival models in survival analysis. The hazard discrimination summary HDS(t) proposed by Liang and Heagerty [2017] is used to evaluate the mean hazard difference between cases and controls at time t. Liang and Heagerty [2017] evaluated the discrimination performance under the PH model and time dependent PH model with right censoring.

In this thesis, first, we further investigate their method via comprehensive simulations including 1) We extend the simulation in Liang and Heagerty [2017] under the PH model by adding more scenarios such as different …


Investigations On Multiple Interval Estimators, Taeho Kim Jul 2019

Investigations On Multiple Interval Estimators, Taeho Kim

Theses and Dissertations

Multiple interval estimation for a set of parameters is investigated. To begin, a strategy of optimization for a multiple interval estimator (MIE) is introduced. This approach allocates distinct optimized levels to individual interval estimators so that the global expected content can be minimized while the global coverage probability is still maintained at a global level. This optimal allocation is achieved by a decision theoretic procedure which consists of two global risk functions. The major part of this manuscript is devoted to two multiple interval estimation procedures. Both procedures adopt prior information added to the classical setting, but these procedures do …


Copula-Based Zero-Inflated Count Time Series Models, Mohammed Sulaiman Alqawba Jul 2019

Copula-Based Zero-Inflated Count Time Series Models, Mohammed Sulaiman Alqawba

Mathematics & Statistics Theses & Dissertations

Count time series data are observed in several applied disciplines such as in environmental science, biostatistics, economics, public health, and finance. In some cases, a specific count, say zero, may occur more often than usual. Additionally, serial dependence might be found among these counts if they are recorded over time. Overlooking the frequent occurrence of zeros and the serial dependence could lead to false inference. In this dissertation, we propose two classes of copula-based time series models for zero-inflated counts with the presence of covariates. Zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), and zero-inflated Conway-Maxwell-Poisson (ZICMP) distributed marginals of the …


A Computationally Efficient Methodology In Pricing A Guaranteed Minimum Accumulation Benefit, Yiming Huang Jun 2019

A Computationally Efficient Methodology In Pricing A Guaranteed Minimum Accumulation Benefit, Yiming Huang

Electronic Thesis and Dissertation Repository

In this thesis, we consider a framework under which three correlated factors, namely, financial, mortality and lapse risks, are modelled in an integrated way. This modelling framework supports the valuation of a guaranteed minimum accumulation benefit (GMAB). The change-of-measure approach is employed to come up with a compact and implementable valuation expressions. We provide a numerical demonstration to confirm the efficiency and accuracy of our proposed pricing methodology. In particular, our approach on average takes only 0.07% of the computing time entailed by the Monte-Carlo (MC) simulation technique. Furthermore, the standard errors of our approach’s results are lower than those …


Cocyclic Hadamard Matrices: An Efficient Search Based Algorithm, Jonathan S. Turner Jun 2019

Cocyclic Hadamard Matrices: An Efficient Search Based Algorithm, Jonathan S. Turner

Theses and Dissertations

This dissertation serves as the culmination of three papers. “Counting the decimation classes of binary vectors with relatively prime fixed-density" presents the first non-exhaustive decimation class counting algorithm. “A Novel Approach to Relatively Prime Fixed Density Bracelet Generation in Constant Amortized Time" presents a novel lexicon for binary vectors based upon the Discrete Fourier Transform, and develops a bracelet generation method based upon the same. “A Novel Legendre Pair Generation Algorithm" expands upon the bracelet generation algorithm and includes additional constraints imposed by Legendre Pairs. It further presents an efficient sorting and comparison algorithm based upon symmetric functions, as well …


Field Drilling Data Cleaning And Preparation For Data Analytics Applications, Daniel Cardoso Braga Jun 2019

Field Drilling Data Cleaning And Preparation For Data Analytics Applications, Daniel Cardoso Braga

LSU Master's Theses

Throughout the history of oil well drilling, service providers have been continuously striving to improve performance and reduce total drilling costs to operating companies. Despite constant improvement in tools, products, and processes, data science has not played a large part in oil well drilling. With the implementation of data science in the energy sector, companies have come to see significant value in efficiently processing the massive amounts of data produced by the multitude of internet of thing (IOT) sensors at the rig. The scope of this project is to combine academia and industry experience to analyze data from 13 different …


Optimizing Electrospun Ceramic Nanofiber Strength Through Two-Step Sintering, Michael Ross Jun 2019

Optimizing Electrospun Ceramic Nanofiber Strength Through Two-Step Sintering, Michael Ross

Materials Engineering

Two-step sintering (TSS) consists of a high-temperature step and immediate cooling to a sintering temperature for an extended sintering time, where grain growth is suppressed by severe densification during the high-temperature step. TSS is adopted to enhance mechanical properties of electrospun ceramic nanofibers (CNFs), a class of porous ceramics used for environmental remediation, optoelectronics, and filtration. PVP and Ga(NO3)3 nanofiber mesh, provided by Lawrence Livermore National Laboratory, was shaped, oxidized, and two-step sintered to form a nanocrystalline β-Ga2O3 CNF tube using a high-temperature step of 1,000oC. Sintering temperatures and times varied from …


The Martingale Approach To Financial Mathematics, Jordan M. Rowley Jun 2019

The Martingale Approach To Financial Mathematics, Jordan M. Rowley

Master's Theses

In this thesis, we will develop the fundamental properties of financial mathematics, with a focus on establishing meaningful connections between martingale theory, stochastic calculus, and measure-theoretic probability. We first consider a simple binomial model in discrete time, and assume the impossibility of earning a riskless profit, known as arbitrage. Under this no-arbitrage assumption alone, we stumble upon a strange new probability measure Q, according to which every risky asset is expected to grow as though it were a bond. As it turns out, this measure Q also gives the arbitrage-free pricing formula for every asset on our market. In …


Implementation Of Multivariate Artificial Neural Networks Coupled With Genetic Algorithms For The Multi-Objective Property Prediction And Optimization Of Emulsion Polymers, David Chisholm Jun 2019

Implementation Of Multivariate Artificial Neural Networks Coupled With Genetic Algorithms For The Multi-Objective Property Prediction And Optimization Of Emulsion Polymers, David Chisholm

Master's Theses

Machine learning has been gaining popularity over the past few decades as computers have become more advanced. On a fundamental level, machine learning consists of the use of computerized statistical methods to analyze data and discover trends that may not have been obvious or otherwise observable previously. These trends can then be used to make predictions on new data and explore entirely new design spaces. Methods vary from simple linear regression to highly complex neural networks, but the end goal is similar. The application of these methods to material property prediction and new material discovery has been of high interest …


An Epidemiological Model With Simultaneous Recoveries, Ariel B. Farber Jun 2019

An Epidemiological Model With Simultaneous Recoveries, Ariel B. Farber

Electronic Theses and Dissertations

Epidemiological models are an essential tool in understanding how infection spreads throughout a population. Exploring the effects of varying parameters provides insight into the driving forces of an outbreak. In this thesis, an SIS (susceptible-infectious-susceptible) model is built partnering simulation methods, differential equations, and transition matrices with the intent to describe how simultaneous recoveries influence the spread of a disease in a well-mixed population. Individuals in the model transition between only two states; an individual is either susceptible — able to be infected, or infectious — able to infect others. Events in this model (infections and recoveries) occur by way …


Statistical Machine Learning Methods For Mining Spatial And Temporal Data, Fei Tan May 2019

Statistical Machine Learning Methods For Mining Spatial And Temporal Data, Fei Tan

Dissertations

Spatial and temporal dependencies are ubiquitous properties of data in numerous domains. The popularity of spatial and temporal data mining has thus grown with the increasing prevalence of massive data. The presence of spatial and temporal attributes not only provides complementary useful perspectives, but also poses new challenges to the representation and integration into the learning procedure. In this dissertation, the involved spatial and temporal dependencies are explored with three genres: sample-wise, feature-wise, and target-wise. A family of novel methodologies is developed accordingly for the dependency representation in respective scenarios.

First, dependencies among discrete, continuous and repeated observations are studied …


A Review Of Statistical Analysis Of Genetic Case-Control Data, Jin Zhang May 2019

A Review Of Statistical Analysis Of Genetic Case-Control Data, Jin Zhang

Major Papers

This paper considers the analysis of genetic case-control data. We consider the allele frequency in cases and controls. Because each individual has two alleles at any autosomal locus, there will be twice as many alleles as people in the allele distribution. Simultaneously, the serological distribution is bulit by ignoring the difference between homozygous and herterozygous. We also consider the marker loci with multiple alleles. Traditional case-control studies provide a powerful and efficient method for evaluation of association between candidate gene and disease. There has been debate on how the power of tests for association changes with different allelic effect. To …


Probabilistic And Statistical Prediction Models For Alzheimer’S Disease And Statistical Analysis Of Global Warming, Maryam Ibrahim Habadi May 2019

Probabilistic And Statistical Prediction Models For Alzheimer’S Disease And Statistical Analysis Of Global Warming, Maryam Ibrahim Habadi

USF Tampa Graduate Theses and Dissertations

The importance and applicability of data-driven statistical models have increased significantly. This current study, we have utilized statistical techniques in interdisciplinary research, including environmental and health.

Environmentally, global warming is considered one of the critical issues facing our planet. It is the increase in average global temperatures caused mostly by increases in Carbon Dioxide CO2. The excessive rise of carbon dioxide from the average level as the side effect of the industrial revolution has a significant impact on blocking the heat and increase the temperature within the Earth’s atmosphere. Based on the record of total CO2 emissions …


Detection Of Sand Boils From Images Using Machine Learning Approaches, Aditi S. Kuchi May 2019

Detection Of Sand Boils From Images Using Machine Learning Approaches, Aditi S. Kuchi

University of New Orleans Theses and Dissertations

Levees provide protection for vast amounts of commercial and residential properties. However, these structures degrade over time, due to the impact of severe weather, sand boils, subsidence of land, seepage, etc. In this research, we focus on detecting sand boils. Sand boils occur when water under pressure wells up to the surface through a bed of sand. These make levees especially vulnerable. Object detection is a good approach to confirm the presence of sand boils from satellite or drone imagery, which can be utilized to assist in the automated levee monitoring methodology. Since sand boils have distinct features, applying object …


Advances In Measurement Error Modeling, Linh Nghiem May 2019

Advances In Measurement Error Modeling, Linh Nghiem

Statistical Science Theses and Dissertations

Measurement error in observations is widely known to cause bias and a loss of power when fitting statistical models, particularly when studying distribution shape or the relationship between an outcome and a variable of interest. Most existing correction methods in the literature require strong assumptions about the distribution of the measurement error, or rely on ancillary data which is not always available. This limits the applicability of these methods in many situations. Furthermore, new correction approaches are also needed for high-dimensional settings, where the presence of measurement error in the covariates adds another level of complexity to the desirable structure …


The Long-Run Effects Of Tropical Cyclones On Infant Mortality, Isabel Miranda May 2019

The Long-Run Effects Of Tropical Cyclones On Infant Mortality, Isabel Miranda

Master's Theses

In the United States alone, each tropical cyclone causes an average of $14.6 billion worth of damages. In addition to the destruction of physical infrastructure, natural disasters also negatively impact human capital formation. These losses are often more difficult to observe, and therefore, are over looked when quantifying the true costs of natural disasters. One particular effect is an increase in infant mortality rates, an important indicator of a country’s general socioeconomic level. This paper utilizes a model created by Anttila-Hughes and Hsiang, that takes advantage of annual variation in tropical cyclones using annual spatial average maximum wind speeds and …


Samples, Unite! Understanding The Effects Of Matching Errors On Estimation Of Total When Combining Data Sources, Benjamin Williams May 2019

Samples, Unite! Understanding The Effects Of Matching Errors On Estimation Of Total When Combining Data Sources, Benjamin Williams

Statistical Science Theses and Dissertations

Much recent research has focused on methods for combining a probability sample with a non-probability sample to improve estimation by making use of information from both sources. If units exist in both samples, it becomes necessary to link the information from the two samples for these units. Record linkage is a technique to link records from two lists that refer to the same unit but lack a unique identifier across both lists. Record linkage assigns a probability to each potential pair of records from the lists so that principled matching decisions can be made. Because record linkage is a probabilistic …


Variational Inference For Quantile Rgression, Bufei Guo May 2019

Variational Inference For Quantile Rgression, Bufei Guo

Arts & Sciences Electronic Theses and Dissertations

Quantile regression (QR) (Koenker and Bassett, 1978), is an alternative to classic lin- ear regression with extensive applications in many fields. This thesis studies Bayesian quantile regression (Yu and Moyeed, 2001) using variational inference, which is one of the alternative methods to the Markov chain Monte Carlo (MCMC) in approximating intractable posterior distributions. The lasso regularization is shown to be effective in improving the accuracy of quantile regression (Li and Zhu, 2008). This thesis developed variational inference for quantile regression and regularized quantile regression with the lasso penalty. Simulation results show that variational inference is a computationally more efficient alternative …


Decision Trees And Their Application For Classification And Regression Problems, Obinna Chilezie Njoku May 2019

Decision Trees And Their Application For Classification And Regression Problems, Obinna Chilezie Njoku

MSU Graduate Theses

Tree methods are some of the best and most commonly used methods in the field of statistical learning. They are widely used in classification and regression modeling. This thesis introduces the concept and focuses more on decision trees such as Classification and Regression Trees (CART) used for classification and regression predictive modeling problems. We also introduced some ensemble methods such as bagging, random forest and boosting. These methods were introduced to improve the performance and accuracy of the models constructed by classification and regression tree models. This work also provides an in-depth understanding of how the CART models are constructed, …


Quantifying Lithochemical Diversity Of Martian Materials Using Hierarchical Clustering And A Similarity Index For Classification, Michael Conner Bouchard May 2019

Quantifying Lithochemical Diversity Of Martian Materials Using Hierarchical Clustering And A Similarity Index For Classification, Michael Conner Bouchard

Arts & Sciences Electronic Theses and Dissertations

We are currently living in the golden age of robotic exploration of Mars, with a continued robotic presence there since 1997. Next to Earth, Mars is the planet about which we have gathered the most geologic information. Unlike Earth, Mars does not appear to have plate tectonics, and the planet’s primary and secondary crust is dominated by basalts. Understanding the compositional diversity of the materials that make up the martian crust will give us a better insight into the geologic processes that formed the planet and its subsequent evolution. One large and growing source of martian surface compositions is the …


Market Research On Student Concert Attendance At Bgsu's College Of Musical Arts, Mary Solomon May 2019

Market Research On Student Concert Attendance At Bgsu's College Of Musical Arts, Mary Solomon

Honors Projects

Bowling Green State University boasts a well established College of Musical Arts which holds concerts performed by esteemed faculty, prestigious guest artists, and students. The school hosts these events in Kobacker Hall and Bryan Recital Hall which can accommodate up to 800 and 250 audience members, respectively. However, performances in Kobacker hall only fill one- fourth of the 800 seats, on average. Why is this so? This project aims to investigate the factors that influence students’ decisions to attend concerts at the College of Musical Arts (CMA). By methodology of survey research and statistical analysis, this project will look into …


Essays On Econometrics And Rational Choice, Junnan He May 2019

Essays On Econometrics And Rational Choice, Junnan He

Arts & Sciences Electronic Theses and Dissertations

Decision and choice theory is a topic of interest in both econometrics and microeconomic theory. We contribute to the theory of decision under both contexts, that is, the theory of model selection in econometrics, and the theory of rational decision in microeconomics.

There is a long-lasting theoretical interest in model selection. More recently, research on sparse estimators, a class of estimation methods that select and estimate important parameters simultaneously, has been the central focus on model selection. The methods become especially relevant when the problem is of high-dimensional nature. Theoretically, sparse methods can perform well when the true data generating …


Mechanics Of Phenotypic Aging Trajectories In C. Elegans And Humans, William Zhang May 2019

Mechanics Of Phenotypic Aging Trajectories In C. Elegans And Humans, William Zhang

Arts & Sciences Electronic Theses and Dissertations

Overall, my dissertation integrates longitudinal measurements of physiology to investigate the aging process. In the first half, I examine the surprising and largely unexplained degree of variation in lifespan within even homogeneous populations. I sought to understand how physiological aging differs between long- and short-lived individuals within a population of genetically identical C. elegans reared in a homogeneous environment. Using a novel culture apparatus, I longitudinally monitored aspects of aging physiology across a large population of isolated individuals. Aggregating several measures into an overall estimate of senescence, I find that long- and short-lived individuals start adulthood on an equal physiological …


Topics In Complex And Large-Scale Data Analysis, Guanshengrui Hao May 2019

Topics In Complex And Large-Scale Data Analysis, Guanshengrui Hao

Arts & Sciences Electronic Theses and Dissertations

Past few decades have witnessed skyrocketed development of modern technologies. As a result, data collected from modern technologies are evolving towards a direction with more complicated structure and larger scale, driving the traditional data analysis methods to develop and adapt. In this dissertation, we study three statistical issues rising in data with complicated structure and/or in large scale. In Chapter 2, we propose a Bayesian framework via exponential random graph models (ERGM) to estimate the model parameters and network structures for networks with measurement errors; In Chapter 3, we design a novel network sampling algorithm for large-scale networks with community …