Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 36

Full-Text Articles in Physical Sciences and Mathematics

Variation In Personality Among Semi-Wild Myanmar Timber Elephants, Sateesh Venkatesh Dec 2020

Variation In Personality Among Semi-Wild Myanmar Timber Elephants, Sateesh Venkatesh

Theses and Dissertations

This study examines two personality traits: exploration and neophobia, which could influence human-elephant conflicts. Thirty-one semi-wild elephants were tested over two trials using a custom novel puzzle tube containing three tasks and three rewards. Our studies show that elephants do vary significantly between individuals in both exploration and neophobia.


Incorporation And Measurement Of Uncertainty In Clustered And Spatial Data, Yuan Hong Oct 2020

Incorporation And Measurement Of Uncertainty In Clustered And Spatial Data, Yuan Hong

Theses and Dissertations

Analyzing population representative datasets for local estimation and predictions over time is important for monitoring related public health issues, however, there are many statistical challenges associated with such analyses. Mixed effect models are one of the common options which can incorporate time and spatial effect in the model and related inference is well established.

In the first part of this dissertation, to estimate area-level prevalence using individuallevel data, small area estimation (SAE) with post-stratified mixed effect models were used where sampling weights were also incorporated into it. However, if poststratification which requires more computation effort can improve estimation accuracy is …


Estimation And Inference Under Model Uncertainty, Yizheng Wei Oct 2020

Estimation And Inference Under Model Uncertainty, Yizheng Wei

Theses and Dissertations

Chapter 1 of this dissertation proposes a consistent and locally efficient estimator to estimate the model parameters for a logistic mixed effect model with random slopes. Our approach relaxes two typical assumptions: the random effects being normally distributed, and the covariates and random effects being independent of each other. Adhering to these assumptions is particularly difficult in health studies where in many cases we have limited resources to design experiments and gather data in long-term studies, while new findings from other fields might emerge, suggesting the violation of such assumptions. So it is crucial if we could have an estimator …


Categorical And Fuzzy Ensemble-Based Algorithms For Cluster Analysis, Bridget Nicole Manning Oct 2020

Categorical And Fuzzy Ensemble-Based Algorithms For Cluster Analysis, Bridget Nicole Manning

Theses and Dissertations

This dissertation focuses on improving multivariate methods of cluster analysis. In Chapter 3 we discuss methods relevant to the categorical clustering of tertiary data while Chapter 4 considers the clustering of quantitative data using ensemble algorithms. Lastly, in Chapter 5, future research plans are discussed to investigate the clustering of spatial binary data.

Cluster analysis is an unsupervised methodology whose results may be influenced by the types of variables recorded on observations. When dealing with the clustering of categorical data, solutions produced may not accurately reflect the structure of the process that generated them. Increased variability within the latent structure …


Snow-Albedo Feedback In Northern Alaska: How Vegetation Influences Snowmelt, Lucas C. Reckhaus Aug 2020

Snow-Albedo Feedback In Northern Alaska: How Vegetation Influences Snowmelt, Lucas C. Reckhaus

Theses and Dissertations

This paper investigates how the snow-albedo feedback mechanism of the arctic is changing in response to rising climate temperatures. Specifically, the interplay of vegetation and snowmelt, and how these two variables can be correlated. This has the potential to refine climate modelling of the spring transition season. Research was conducted at the ecoregion scale in northern Alaska from 2000 to 2020. Each ecoregion is defined by distinct topographic and ecological conditions, allowing for meaningful contrast between the patterns of spring albedo transition across surface conditions and vegetation types. The five most northerly ecoregions of Alaska are chosen as they encompass …


Machine-Learning-Based Prediction Of Sepsis Events From Vertical Clinical Trial Data: A Naïve Approach, Tyler Michael Gaddis Aug 2020

Machine-Learning-Based Prediction Of Sepsis Events From Vertical Clinical Trial Data: A Naïve Approach, Tyler Michael Gaddis

Theses and Dissertations

Sepsis is a potentially life-threatening condition characterized by a dysregulated, disproportionate immune response to infection by which the afflicted body attacks its own tissues, sometimes to the point of organ failure, and in the worst cases, death. According to the Centers for Disease Control and Prevention (CDC) Sepsis is reported to kill upwards of 270,000 Americans annually, though this figure may be greater given certain ambiguities in the current accepted diagnostic framework of the disease.

This study attempted to first establish an understanding of past definitions of sepsis, and to then recommend use of machine learning as integral in an …


Estimating Distortion Risk Measures Under Truncated And Censored Data Scenarios, Sahadeb Upretee Aug 2020

Estimating Distortion Risk Measures Under Truncated And Censored Data Scenarios, Sahadeb Upretee

Theses and Dissertations

\begin{center}

ABSTRACT\\

\vspace{0.4in}

ESTIMATING DISTORTION RISK MEASURES UNDER TRUNCATED AND CENSORED DATA SCENARIOS

\end{center}

\doublespacing

\noindent

~In insurance data analytics and actuarial practice, a broad class of

risk measures -- {\em distortion risk measures\/} -- are used to capture

the riskiness of the distribution tail. Point and interval estimates of

the risk measures are then employed to price extreme events, to develop

reserves, to design risk transfer strategies, and to allocate capital.

When solving such problems, the main statistical challenge is to choose

an appropriate estimate of a risk measure and to assess its variability.

In this context, the empirical …


An Investigation Of Gene Regulatory Network State Space Variability, Sara Faye Liesman Jul 2020

An Investigation Of Gene Regulatory Network State Space Variability, Sara Faye Liesman

Theses and Dissertations

Genes are segments of DNA that provide a blueprint for cells and organisms to effectively control processes and regulations within individuals. There have been many attempts to quantify these processes, as a greater understanding of how genes operate could have large impacts on both personalized and precision medicine. Gene interactions are of particular interest, however, current biological methods can not easily reveal the details of these interactions. Therefore, we infer networks of interactions from gene expression data which we call a gene regulatory network, or GRN. Due to the robust behavior of genes and the inherent variability within interactions, models …


A Study Of The Efficacy Of Machine Learning For Diagnosing Obstructive Coronary Artery Disease In Non-Diabetic Patients, Demond Larae Handley Jul 2020

A Study Of The Efficacy Of Machine Learning For Diagnosing Obstructive Coronary Artery Disease In Non-Diabetic Patients, Demond Larae Handley

Theses and Dissertations

According to the Centers for Disease Control and Prevention, about 18.2 million adults age 20 and older have Coronary Artery Disease in the United States. Early diagnosis is therefore of crucial importance to help prevent debilitating consequences, and principally death for many patients. In this study we use data containing gene expression values from peripheral blood samples in 198 non-diabetic patients, with the goal of developing an age and sex gene expression model for diagnosis of Coronary Artery Disease. We employ machine learning methods to obtain a classification based on genetic information, age and sex. Our implementation uses feed forward …


Bayesian Zero-Inflated Model For Ordinal Data, Huizhong Yang Jul 2020

Bayesian Zero-Inflated Model For Ordinal Data, Huizhong Yang

Theses and Dissertations

Datasets with a relatively large number of zeros is commonly seen in medical applications. Although models like Zero-inflated Poisson (ZIP) model are proposed for counts data, there is still some issues with ordinal data which have excess zeros. In this paper, we developed a Bayesian approach to accommodate the excess zero in ordinal data. Intellectual disability (ID), also known as mental retardation (MR), is a disability characterized by below-average intelligence or mental ability and a lack of the learning necessary skills for daily life. A person with intellectual disability has intellectual functioning and adaptive behaviors limitations. Intellectual disability is a …


High-Dimensional Inference Based On The Leave-One-Covariate-Out Regularization Path, Xiangyang Cao Jul 2020

High-Dimensional Inference Based On The Leave-One-Covariate-Out Regularization Path, Xiangyang Cao

Theses and Dissertations

The increasingly rapid emergence of high dimensional data, where the number of variables p may be larger than the sample size n, has necessitated the development of new statistical methodologies. LASSO and variants of LASSO are proposed and have been the most popular estimators for the high dimensional regression models. However, not much work has focused on analyzing and summarizing the information contained in the entire solution path of the LASSO. This dissertation consists of three research projects that propose and extend the Leave-One-Covariate-Out(LOCO) solution path statistic to regression and graphical models.

In the first chapter, we propose a new …


Semiparametric Regression Analysis Of Survival Data And Panel Count Data, Lu Wang Jul 2020

Semiparametric Regression Analysis Of Survival Data And Panel Count Data, Lu Wang

Theses and Dissertations

Both censored survival data and panel count data arise commonly in real-life studies in many fields such as epidemiology, social science, and medical research. In these studies, subjects are usually examined multiple times at periodical or irregular follow-up examinations. Censored data are studied when the exact failure times of the events are of interest but not all of these exact times are directly observed. Some of the failure times of event of interest are only known to fall within some intervals formed by the observation times. Panel count data are under investigation when the exact times of the recurrent events …


Network-Based Statistical Analysis Of Functional Magnetic Resonance Imaging Data From Aphasia Patients, Xingpei Zhao Jul 2020

Network-Based Statistical Analysis Of Functional Magnetic Resonance Imaging Data From Aphasia Patients, Xingpei Zhao

Theses and Dissertations

Functional magnetic resonance imaging (fMRI) is a neuroimaging technique that provides insight into brain function and activity. Network models of fMRI signals can reveal functional connectivity related to certain brain disorders, such as post-stroke aphasia. This thesis aims to identify the functional connections that distinguish anomic and Broca’s aphasia by comparing the resting-state fMRI from the patients with these two types of aphasia. The network-based statistic (NBS) approach is used to detect such connections. After the analytic pipeline is applied to the fMRI data, the NBS approach identifies a distinct subnetwork between the two types of aphasia, which involves the …


The Practical Advantages And Disadvantages Of Laplace Regression As An Alternative To Cox Proportional Hazards Model: A Comparison Via Simulation, Sydney Smith Jul 2020

The Practical Advantages And Disadvantages Of Laplace Regression As An Alternative To Cox Proportional Hazards Model: A Comparison Via Simulation, Sydney Smith

Theses and Dissertations

The Cox proportional hazards model is the most common regression technique for survival analysis. However, the proportional hazards assumption restricts it’s use to a limited group of multiplicative models. Laplace regression is a flexible quantile regression technique for censored observations that is appropriate in a wider variety of applications as compared to the Cox proportional hazards model. Instead of estimating a hazard ratio, Laplace regression which is free from a proportionality assumption, can be used to estimate many adjusted percentiles of survival time allowing for a more complete description of the association of interest. This paper compares the performance of …


A Study Of Cusum Statistics On Bitcoin Transactions, Ivan Perez May 2020

A Study Of Cusum Statistics On Bitcoin Transactions, Ivan Perez

Theses and Dissertations

In this thesis, our objective is to study the relationship between transaction price and volume in the BTC/USD Coinbase exchange. In the second chapter, we develop a consecutive CUSUM algorithm to detect instantaneous changes in the arrival rate of market orders. We begin by estimating a baseline rate using the assumption of a local time-homogeneous Poisson process. Our observations lead us to reject the plausibility of a time-homogeneous Poisson model on a more global scale by using a chi squared test. We thus proceed to use CUSUM-based alarms to detect consecutive upward and downward changes in the arrival rate of …


Biomarker Development For Use In Regression Calibration, Yiwen Zhang May 2020

Biomarker Development For Use In Regression Calibration, Yiwen Zhang

Theses and Dissertations

It is challenging to alleviate systematic measurement error in self-reported data when studying the associations between dietary intakes and chronic disease risk. The regression calibration method has been used for this purpose when an objectively measured biomarker that satisfies a classical measurement error assumption is available. The requirement for the biomarkers needs to be quite strong and very few dietary intake biomarkers as such have been developed. Feeding studies provide opportunities to develop such potential biomarkers using regression methods with a much larger variety of dietary variables. However, the measurement error for the resulting biomarkers will be of Berkson type …


Infant Mortality In The United States: Socioeconomic Factors Predicting Infant Survival In Late Neo-Natal And Post Neo-Natal Infants From Birth Certificate Data, Mark Brunk-Grady May 2020

Infant Mortality In The United States: Socioeconomic Factors Predicting Infant Survival In Late Neo-Natal And Post Neo-Natal Infants From Birth Certificate Data, Mark Brunk-Grady

Theses and Dissertations

According to the Centers for Disease Control and Prevention, the infant mortality rate in the United States in 2018 was 5.6 deaths per 1000 live births. Infant mortality is defined as a child being born alive but dying before their first birthday. This study aimed to determine if adding socioeconomic factors to traditional predictive survival models improved the predictive power in terms of survival for late and post neonatal infants. Secondly, this study looked to develop a risk score to and predict which mothers would be classified as “High” or “Low” risk for infant death.

Data were analyzed from a …


Smoothed Quantiles For Claim Frequency Models, With Applications To Risk Measurement, Ponmalar Suruliraj Ratnam May 2020

Smoothed Quantiles For Claim Frequency Models, With Applications To Risk Measurement, Ponmalar Suruliraj Ratnam

Theses and Dissertations

Statistical models for the claim severity and claim frequency variables are routinely constructed and utilized by actuaries. Typical applications of such models include identification of optimal deductibles for selected loss elimination ratios, pricing of contract layers, determining credibility factors, risk and economic capital measures, and evaluation of effects of inflation, market trends and other quantities arising in insurance. While the actuarial literature on the severity models is extensive and rapidly growing, that for the claim frequency models lags behind. One of the reasons for such a gap is that various actuarial metrics do not possess ``nice'' statistical properties for the …


Fitting Of Lotka-Volterra Model For Coupled Population Growth Data Through Least-Squares Estimation Of Parameters, Jessica Ann Harter May 2020

Fitting Of Lotka-Volterra Model For Coupled Population Growth Data Through Least-Squares Estimation Of Parameters, Jessica Ann Harter

Theses and Dissertations

The population of two types of bacteria found in the Gulf Coast of Florida, V.chagasii and V. harveyi, can be described by the Lotka-Voltera competition model. Using data gathered in experiments conducted by Bury and Pickett (2015), we take a different approach to find parameter estimates using numerical methods in R. In particular, we find a numerical solution to the coupled set of ODEs and minimize the sum of squared errors in order to obtain the optimal parameter estimates that will fit the data best. In order to get a sense of accuracy of these parameter estimates, we use bootstrap …


Flexible Regression Models For Survival Data, Ennan Gu Apr 2020

Flexible Regression Models For Survival Data, Ennan Gu

Theses and Dissertations

Survival analysis is a branch of statistics to analyze the time-to-event data or survival data. One important feature of survival data is censoring, which means that not all the subjects’ survival time are observed directly. Among all the survival data, right-censored data are the most common type and consist of some exactly observed survival times and some right-censored observations. In this dissertation, we focus on studying flexible regression models for complicated right-censored survival data when the classical proportional hazards (PH) assumption is not satisfied. Flexible semiparametric regression models can largely avoid misspecification of parametric distributions and thus provide more modeling …


Bayesian Analysis Of Binary Diagnostic Tests And Panel Count Data, Chunling Wang Apr 2020

Bayesian Analysis Of Binary Diagnostic Tests And Panel Count Data, Chunling Wang

Theses and Dissertations

This dissertation mainly explores several challenging topics that arise in diagnostic tests and panel count data in the Bayesian framework. Binary diagnostic tests, particularly multiple diagnostic tests with repeated measures and diagnostic procedures with a large number of raters, are studied. For panel count data, most traditional methods only handle panel count data for a single type of recurrent event. In this dissertation, we primarily focus on the case with multiple types of recurrent events.

In Chapter 1, an introduction to the binary diagnostic tests data and panel count data is presented and related literature works are briefly reviewed. To …


Multivariate Joint Models And Dynamic Predictions, Md Akhtar Hossain Apr 2020

Multivariate Joint Models And Dynamic Predictions, Md Akhtar Hossain

Theses and Dissertations

The joint modeling of longitudinal and time-to-event data is an active area of statistical research that has received a lot of attention. The standard joint models, referred to as univariate joint models, allow simultaneous modeling of a single longitudinal outcome and a single time-to-event under an assumption of independent censoring. The majority of the joint modeling research in the last two decades has focused on extending and improving the univariate joint models. While many of the practical applications involve data on multivariate longitudinal outcomes and multiple timeto- events possibly informatively censored by some other terminal time-to-event, the developments of joint …


Studies Of Group Fused Lasso And Probit Model For Right-Censored Data, Tuan Quoc Do Apr 2020

Studies Of Group Fused Lasso And Probit Model For Right-Censored Data, Tuan Quoc Do

Theses and Dissertations

This document is composed of three main chapters. In the first chapter, we study the mixture of experts, a powerful machine learning model in which each expert handles a different region of the covariate space. However, it is crucial to choose an appropriate number of experts to avoid overfitting or underfitting. A group fused lasso (GFL) term is added to the model with the goal of making the coefficients of the experts and the gating network closer together. An algorithm to optimize the problem is also developed using block-wise coordinate descent in the dual counterpart. Numerical results on simulated and …


Conceptualization And Application Of Deep Learning And Applied Statistics For Flight Plan Recommendation, Nicholas C. Forrest Mar 2020

Conceptualization And Application Of Deep Learning And Applied Statistics For Flight Plan Recommendation, Nicholas C. Forrest

Theses and Dissertations

The Air Forces Pilot Training Next (PTN) program seeks a more efficient pilot training environment emphasizing the use of virtual reality flight simulators alongside periodic real aircraft experience. The PTN program wants to accelerate the training pace and progress in undergraduate pilot training compared to traditional undergraduate pilot training. Currently, instructor pilots spend excessive time planning and scheduling flights. This research focuses on methods to auto-generate the planning of in-flight events using hybrid filtering and deep learning techniques. The resulting approach captures temporal trends of user-specific and program-wide student performance to recommend a feasible set of graded flight events for …


An Analysis Of A Lighting Prediction Threshold For 45th Weather Squadron Electric Field Mill Data, Charles A. Skrovan Mar 2020

An Analysis Of A Lighting Prediction Threshold For 45th Weather Squadron Electric Field Mill Data, Charles A. Skrovan

Theses and Dissertations

The mission of the 45th Weather Squadron (45 WS) is to “exploit the weather to assure safe access to air and space” for Patrick Air Force Base, Cape Canaveral Air Force Station (CCAFS), and Kennedy Space Center (KSC) in support of various operations (United States Air Force, n.d.). To support that mission the 45 WS hosts a suite of weather detection instruments that include a lightning warning system that consists of an array of 31 electric field mills (EFM) and a lightning detection and ranging system (Department of the Air Force, 1976). Electric field mills at Cape Canaveral continuously record …


Ground Weather Radar Signal Characterization Through Application Of Convolutional Neural Networks, Stephen M. Lee Mar 2020

Ground Weather Radar Signal Characterization Through Application Of Convolutional Neural Networks, Stephen M. Lee

Theses and Dissertations

The 45th Weather Squadron supports the space launch efforts out of the Kennedy Space Center and Cape Canaveral Air Force Station for the Department of Defense, NASA, and commercial customers through weather assessments. Their assessment of the Lightning Launch Commit Criteria (LLCC) for avoidance of natural and rocket triggered lightning to launch vehicles is critical in approving space shuttle and rocket launches. The LLCC includes standards for cloud formations, which requires proper cloud identification and characterization methods. Accurate reflectivity measurements for ground weather radar are important to meet the LLCC for rocket triggered lightning. Current linear interpolation methods for ground …


Next-Generation Air Force Weather Metrics Via Bayes Cost Analysis, Brandon M. Bailey Mar 2020

Next-Generation Air Force Weather Metrics Via Bayes Cost Analysis, Brandon M. Bailey

Theses and Dissertations

This research proposes a new methodology for U.S. Air Force weather forecast metrics. Military weather forecasters are essentially statistical classifiers. They categorize future conditions into an operationally relevant category based on current data, much like an Artificial Neural Net or Logistic Regression model. There is extensive literature on statistically-based metrics for these types of classifiers. Additionally, in the U.S. Air Force, forecast errors (errors in classification) have quantifiable operational costs and benefits associated with incorrect or correct classification decisions. There is a methodology in the literature, Bayes Cost, which provides a structure for creating statistically rigorous metrics for classification decisions …


Analysis And Forecasting Of The 360th Air Force Recruiting Group Goal Distribution, Tyler Spangler Mar 2020

Analysis And Forecasting Of The 360th Air Force Recruiting Group Goal Distribution, Tyler Spangler

Theses and Dissertations

This research utilizes monthly data from 2012-2017 to determine economic or demographic factors that significantly contribute to increased goaling and production potential in areas of the 360th Recruiting Groups. Using regression analysis, a model of recruiting goals and production is built to identify squadrons within the 360 RCGs zone that are capable of producing more or fewer recruits and the factors that contribute to this increased or decreased capability. This research identifies that a zones high school graduation rate, the number of recruiters, and the number of JROTC detachments in a zone are positively correlated with recruiting goals and that …


Characterizing Uncertainty In Correlated Response Variables For Pareto Front Optimization, Peter A. Calhoun Mar 2020

Characterizing Uncertainty In Correlated Response Variables For Pareto Front Optimization, Peter A. Calhoun

Theses and Dissertations

Current research provides a method to incorporate uncertainty into Pareto front optimization by simulating additional response surface model parameters according to a Multivariate Normal Distribution (MVN). This research shows that analogous to the univariate case, the MVN understates uncertainty, leading to overconfident conclusions when variance is not known and there are few observations (less than 25-30 per response). This research builds upon current methods using simulated response surface model parameters that are distributed according to an Multivariate t-Distribution (MVT), which can be shown to produce a more accurate inference when variance is not known. The MVT better addresses uncertainty in …


Analysis With Dynamic Bayesian Networks Compared To Simulation, Aaron J. Salazar Mar 2020

Analysis With Dynamic Bayesian Networks Compared To Simulation, Aaron J. Salazar

Theses and Dissertations

This research compares simulations to Dynamic Bayesian Networks in analyzing situations. The research applies models that have known output mean and variance. Queueing systems have theoretical values of the steady-state mean and variance for the number of entities in the system. Monte Carlo simulation development is broken down into two separate approaches: discrete-event simulation and time-oriented simulation. The discrete-event simulation uses pseudo-random numbers to schedule and trigger future events (i.e. customer arrivals and services) and is based on the generated objects.The time-oriented simulation utilizes fixed-width time intervals and updates the system state according to a stochastic process for the set …