Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 31 - 60 of 70

Full-Text Articles in Physical Sciences and Mathematics

Assessing Differential Item Functioning In The Perceived Stress Scale, Nana Amma Berko Asamoah Jul 2020

Assessing Differential Item Functioning In The Perceived Stress Scale, Nana Amma Berko Asamoah

Graduate Theses and Dissertations

When an item on a test functions differently for subgroups of respondents with respect to an exogenous variable (or covariate) after conditioning on the latent variable of interest, the item is said to exhibit Differential Item Functioning (DIF). The 10-item Perceived Stress Scale (PSS10) is administered to respondents via MTurk to quantify “perceived stress” and identify if items on the scale function differently for specific subgroups defined by age, sex, race, marital status, number of children, employment status and social media usage.

The purpose of this study was to compare traditional DIF detection approaches (Mantel-Haenszel, logistic regression, likelihood ratio test …


Learning Networks With Categorical Data Using Distance Correlation, And A Novel Graph-Based Multivariate Test, Jian Tinker Jul 2020

Learning Networks With Categorical Data Using Distance Correlation, And A Novel Graph-Based Multivariate Test, Jian Tinker

Graduate Theses and Dissertations

We study the use of distance correlation for statistical inference on categorical data, especially the induction of probability networks. Szekely et al. first defined distance correlation for continuous variables in [42], and Zhang translated the concept into the categorical setting in [57] by defining dCor(X,Y) for categorical variables X = (x1,...,xI) and Y = (y1,...,yJ) where P(X=xi)=[pi]i and P(Y=yi)=[pi]j with the formula [Please open the document]

Part I of the dissertation covers the background we need to understand this formula, and prepares us to analyze the properties and performance of its applications.

Part II then presents the main results of …


Structural Analysis Of The Multifunctional Spoiie Regulatory Protein Of Clostridioides Difficile., Blythe Emily Bunkers Jul 2020

Structural Analysis Of The Multifunctional Spoiie Regulatory Protein Of Clostridioides Difficile., Blythe Emily Bunkers

Graduate Theses and Dissertations

Clostridioides (formally Clostridium) difficile is a medically relevant pathogen pertinent to infectious disease research. C. difficile is distinctly known for its ability to produce two toxins, enterotoxin A and cytotoxin B, and the propensity to colonize the mammalian gastrointestinal tract. It is known that metabolism is tightly correlated with sporulation in endospore producers such as C. difficile, but an interesting and novel regulatory relationship found by the Ivey lab has yet to be understood. The relationship explored in this study is observed between the sporulation factor, SpoIIE, which represses expression of an ABC peptide transporter, app. In this study, two …


Measuring Sexual Excitation And Sexual Inhibition In A Dutch-Speaking Sample, Malachi Willis Jul 2020

Measuring Sexual Excitation And Sexual Inhibition In A Dutch-Speaking Sample, Malachi Willis

Graduate Theses and Dissertations

Background: Individual differences in sexual excitation and sexual inhibition are important predictors of sexual functioning. Psychometric instruments for these aspects of sexual response were originally developed separately for men (Sexual Inhibition /Sexual Excitation Scales [SIS/SES]) and women (Sexual Excitation/Sexual Inhibition Inventory for Women [SESII-W]). These measures were then adapted to function similarly in samples comprising both men and women (Sexual Inhibition/Sexual Excitation Scales-Short Form [SIS/SES-SF] and Sexual Excitation/Sexual Inhibition Inventory for Women and Men [SESII-W/M], respectively). No published study to our knowledge has administered the SIS/SES and SESII-W/M questionnaires to a sample of both women and men. In the present …


Detecting Differentially Co-Expressed Gene Modules Via The Edge-Count Test, Anne Gratius Lin Dec 2019

Detecting Differentially Co-Expressed Gene Modules Via The Edge-Count Test, Anne Gratius Lin

Graduate Theses and Dissertations

Background

Gene expression profiling by microarray has been used to uncover molecular variations in many different diseases. Complementary to conventional differential expression analysis, differential co-expression analysis can identify gene markers from the systematic and granular level. There are three aspects for differential co-expression network analysis, including the network global topological comparison, differential co-expression cluster identification, and differential co-expressed genes and gene pair identification. To date, most of the methods available still rely on Pearson’s correlation coefficient despite its nonlinear insensitivity.

Results

Here we present an approach that is robust to nonlinearity by using the edge-count test for differential co-expression analysis. …


Effect Of Cross-Validation On The Output Of Multiple Testing Procedures, Josh Dallas Price Aug 2019

Effect Of Cross-Validation On The Output Of Multiple Testing Procedures, Josh Dallas Price

Graduate Theses and Dissertations

High dimensional data with sparsity is routinely observed in many scientific disciplines. Filtering out the signals embedded in noise is a canonical problem in such situations requiring multiple testing. The Benjamini--Hochberg procedure using False Discovery Rate control is the gold standard in large scale multiple testing. In Majumder et al. (2009) an internally cross-validated form of the procedure is used to avoid a costly replicate study and the complications that arise from population selection in such studies (i.e. extraneous variables). I implement this procedure and run extensive simulation studies under increasing levels of dependence among parameters and different data generating …


Spatio-Temporal Analysis Of Tree Ring Chronology And Precipitation, Ruizhe Yin Aug 2019

Spatio-Temporal Analysis Of Tree Ring Chronology And Precipitation, Ruizhe Yin

Graduate Theses and Dissertations

Tree ring chronology data is known to reflect regional climate due to the strong impact of rainfall and temperature. Therefore, tree ring data can be used to reconstruct historical climate in order to understand how climate changed in the past and make prediction about the future behavior of the climate. For simplicity, this research only considers the influence of precipitation on tree ring growth within the New England area. A total of 94 measurement sites are used to record tree ring width over 881 years and corresponding precipitation data are given at some locations for 121 years. We developed a …


Spatio-Temporal Prediction Of Arkansas Gubernatorial Election, Michael Harris Aug 2019

Spatio-Temporal Prediction Of Arkansas Gubernatorial Election, Michael Harris

Graduate Theses and Dissertations

Our goal is to create spatio-temporal models for predicting future gubernatorial elections. For a concrete example of how well our models work we use past data to predict the 2018 Arkansas gubernatorial election and use the existing 2018 election data to check our models predictive accuracy. Gubernatorial election data was collected from the Arkansas Secretary of State website while related covariate data was collected from the website for the Federal Reserve Bank of St. Louis. The data we collect is on the county level. For predictive purposes we fit multiple models to the data using Markov chain Monte Carlo and …


Probabilistic Models For Order-Picking Operations With Multiple In-The-Aisle Pick Positions, Jingming Liu Aug 2019

Probabilistic Models For Order-Picking Operations With Multiple In-The-Aisle Pick Positions, Jingming Liu

Graduate Theses and Dissertations

The development of probability density functions (pdfs) for travel time of a narrow aisle lift truck (NALT) and an automated storage and retrieval (AS/R) machine is the focus of the dissertation. The multiple in-the-aisle pick positions (MIAPP) order picking system can be modeled as an M/G/1 queueing problem in which storage and retrieval requests are the customers and the vehicle (NALT or AS/R machine) is the server. Service time is the sum of travel time and the deterministic time to pick up and deposit a pallet (TPD).

Our first contribution is the development of travel time pdfs for retrieval operations …


A Hidden Markov Factor Analysis Framework For Seizure Detection In Epilepsy Patients, Mahboubeh Madadi May 2019

A Hidden Markov Factor Analysis Framework For Seizure Detection In Epilepsy Patients, Mahboubeh Madadi

Graduate Theses and Dissertations

Approximately 1% of the world population suffers from epilepsy. Continuous long-term electroencephalographic (EEG) monitoring is the gold-standard for recording epileptic seizures and assisting in the diagnosis and treatment of patients with epilepsy. Detection of seizure from the recorded EEG is a laborious, time consuming and expensive task. In this study, we propose an automated seizure detection framework to assist electroencephalographers and physicians with identification of seizures in recorded EEG signals. In addition, an automated seizure detection algorithm can be used for treatment through automatic intervention during the seizure activity and on time triggering of the injection of a radiotracer to …


Advanced Statistics In Arkansas Sports Reporting, Andrew Lee Epperson May 2019

Advanced Statistics In Arkansas Sports Reporting, Andrew Lee Epperson

Graduate Theses and Dissertations

This study seeks to analyze how Arkansas’ sports journalists are adapting to the recent surge in available advanced statistics that are being used by certain national news organizations. Using in-depth qualitative research that includes in-depth interviews with a number of individuals in the print, broadcast, and athletics side of sports coverage, we discover how journalists and coaches use these next-generation analytics, what they fundamentally mean for the evolution of each respective path, and why so few Arkansas reporters and writers use them at the time of this paper’s defense. We see how budgets and deadlines restrict the use of these …


Comparing Elo, Glicko, Irt, And Bayesian Irt Statistical Models For Educational And Gaming Data, Breanna Morrison May 2019

Comparing Elo, Glicko, Irt, And Bayesian Irt Statistical Models For Educational And Gaming Data, Breanna Morrison

Graduate Theses and Dissertations

Statistical models used for estimating skill or ability levels often vary by field, however their underlying mathematical models can be very similar. Differences in the underlying models can be due to the need to accommodate data with different underlying formats and structure. As the models from varying fields increase in complexity, their ability to be applied to different types of data may have the ability to increase. Models that are applied to educational or psychological data have advanced to accommodate a wide range of data formats, including increased estimation accuracy with sparsely populated data matrices. Conversely, the field of online …


A Bayesian Framework For Estimating Seismic Wave Arrival Time, Hua Zhong May 2019

A Bayesian Framework For Estimating Seismic Wave Arrival Time, Hua Zhong

Graduate Theses and Dissertations

Because earthquakes have a large impact on human society, statistical methods for better studying earthquakes are required. One characteristic of earthquakes is the arrival time of seismic waves at a seismic signal sensor. Once we can estimate the earthquake arrival time accurately, the earthquake location can be triangulated, and assistance can be sent to that area correctly. This study presents a Bayesian framework to predict the arrival time of seismic waves with associated uncertainty. We use a change point framework to model the different conditions before and after the seismic wave arrives. To evaluate the performance of the model, we …


Sequential Inference For Hidden Markov Models, Michael Ellis Dec 2018

Sequential Inference For Hidden Markov Models, Michael Ellis

Graduate Theses and Dissertations

In many applications data are collected sequentially in time with very short time intervals between observations. If one is interested in using new observations as they arrive in time then non-sequential Bayesian inference methods, such as Markov Chain Monte Carlo (MCMC) sampling, can be too slow. Increasingly, state space models are being used to model nonlinear and non-Gaussian systems. The structure of state space models allows for sequential Bayesian inference so that an approximation to the posterior distribution of interest can be updated as new observations arrive. In special cases, the exact posterior distribution can be updated through conjugate Bayesian …


Budget-Constrained Regression Model Selection Using Mixed Integer Nonlinear Programming, Jingying Zhang Dec 2018

Budget-Constrained Regression Model Selection Using Mixed Integer Nonlinear Programming, Jingying Zhang

Graduate Theses and Dissertations

Regression analysis fits predictive models to data on a response variable and corresponding values for a set of explanatory variables. Often data on the explanatory variables come at a cost from commercial databases, so the available budget may limit which ones are used in the final model.

In this dissertation, two budget-constrained regression models are proposed for continuous and categorical variables respectively using Mixed Integer Nonlinear Programming (MINLP) to choose the explanatory variables to be included in solutions. First, we propose a budget-constrained linear regression model for continuous response variables. Properties such as solvability and global optimality of the proposed …


Quantitative Microbial Risk Assessment For Parts, Ground, And Msc Poultry Product Including Intervention Analysis And Exploration Of Enterobacteriaceae As An Indicator Organism In Poultry Processing, Leigh Ann Parette Dec 2018

Quantitative Microbial Risk Assessment For Parts, Ground, And Msc Poultry Product Including Intervention Analysis And Exploration Of Enterobacteriaceae As An Indicator Organism In Poultry Processing, Leigh Ann Parette

Graduate Theses and Dissertations

Samples collected at five different large bird poultry processing facilities over a period of 7 months from prescald to post debone locations were enumerated for Enterobacteriaceae, Salmonella spp., and Campylobacter spp. and the results were used to create Quantitative Microbial Risk Analyses (QMRA) models for parts, ground, and mechanically separated chicken (MSC) products. Sensitivity analyses indicated the points in the process at which reductions would be most advantageous to the endpoint and simulation models were run to test reductions required to meet the current USDA performance standards.

These data were analyzed to determine the reductions from one node (location) to …


A Generative Statistical Approach For Data Classification In A Biologically Inspired Design Tool, Marvin Manuel Arroyo Rujano Dec 2018

A Generative Statistical Approach For Data Classification In A Biologically Inspired Design Tool, Marvin Manuel Arroyo Rujano

Graduate Theses and Dissertations

The objective of the research this thesis describes is to find a way to classify text-based descriptions of biological adaption to support Biologically Inspired design. Biologically inspired design is a fairly new field with ongoing research. There are different tools to assist designers and biologists in bio-inspired design. Some of the most common are BioTRIZ and AskNature. In recent years, more tools have been proposed to aid and make research in the field easier, for example, the Biologically Inspired Adaptive System Design (BIASD) tool. This tool was designed with the goal of helping designers in early design stages generate more …


Spatio-Temporal Reconstruction Of Remote Sensing Observations, Kamrul Khan Dec 2018

Spatio-Temporal Reconstruction Of Remote Sensing Observations, Kamrul Khan

Graduate Theses and Dissertations

The USDA Forest Service aims to use satellite imagery for monitoring and predicting changes in forest conditions over time within the country. We specifically focus on a 230, 400 hectares region in north-central Wisconsin between 2003 - 2012. The auxiliary data collected from the satellite imagery of this region are relatively dense in space and time and can be used to efficiently predict how the forest condition changed over that decade. However, these records have a significant proportion of missing values due to weather conditions and system failures. To fill in these missing values, we build spaciotemporal models based on …


Comparison Of Correlation, Partial Correlation, And Conditional Mutual Information For Interaction Effects Screening In Generalized Linear Models, Ji Li Aug 2018

Comparison Of Correlation, Partial Correlation, And Conditional Mutual Information For Interaction Effects Screening In Generalized Linear Models, Ji Li

Graduate Theses and Dissertations

Numerous screening techniques have been developed in recent years for genome-wide association studies (GWASs) (Moore et al., 2010). In this thesis, a novel model-free screening method was developed and validated by an extensive simulation study. Many screening methods were mainly focused on main effects, while very few studies considered the models containing both main effects and interaction effects. In this work, the interaction effects were fully considered and three different methods (Pearson’s Correlation Coefficient, Partial Correlation, and Conditional Mutual Information) were tested and their prediction accuracies were compared.

Pearson’s Correlation Coefficient method, which is a direct interaction screening (DIS) procedure, …


Adapting To Sparsity And Heavy Tailed Data, Mohamed Abdelkader Abba Aug 2018

Adapting To Sparsity And Heavy Tailed Data, Mohamed Abdelkader Abba

Graduate Theses and Dissertations

The Lasso and the Horseshoe, gold-standards in the frequentist and Bayesian paradigms, critically depend on learning the error variance. This causes a lack of scale invariance and adaptability to heavy-tailed data. The √ Lasso [Belloni et al., 2011] attempt to correct this by using the `1 norm on both the likelihood and the penalty for the objective function. In contrast, there is essentially no methods for uncertainty quantification or automatic parameter tuning via a formal Bayesian treatment of an unknown error distribution. On the other hand, Bayesian shrinkage priors lacking a local shrinkage term fails to adapt to the large …


Hierarchical Bayesian Regression With Application In Spatial Modeling And Outlier Detection, Ghadeer Mahdi May 2018

Hierarchical Bayesian Regression With Application In Spatial Modeling And Outlier Detection, Ghadeer Mahdi

Graduate Theses and Dissertations

This dissertation makes two important contributions to the development of Bayesian hierarchical models. The first contribution is focused on spatial modeling. Spatial data observed on a group of areal units is common in scientific applications. The usual hierarchical approach for modeling this kind of dataset is to introduce a spatial random effect with an autoregressive prior. However, the usual Markov chain Monte Carlo scheme for this hierarchical framework requires the spatial effects to be sampled from their full conditional posteriors one-by-one resulting in poor mixing. More importantly, it makes the model computationally inefficient for datasets with large number of units. …


Bayesian Model For Detection Of Outliers In Linear Regression With Application To Longitudinal Data, Zahraa Al-Sharea Dec 2017

Bayesian Model For Detection Of Outliers In Linear Regression With Application To Longitudinal Data, Zahraa Al-Sharea

Graduate Theses and Dissertations

Outlier detection is one of the most important challenges with many present-day applications. Outliers can occur due to uncertainty in data generating mechanisms or due to an error in data recording/processing. Outliers can drastically change the study's results and make predictions less reliable. Detecting outliers in longitudinal studies is quite challenging because this kind of study is working with observations that change over time. Therefore, the same subject can produce an outlier at one point in time produce regular observations at all other time points. A Bayesian hierarchical modeling assigns parameters that can quantify whether each observation is an outlier …


Identifying Three-Way Gene Interactions From Microarray Data Using Kolmogorov-Smirnov And Cross-Match Tests, Shubhashree Khadka Aug 2017

Identifying Three-Way Gene Interactions From Microarray Data Using Kolmogorov-Smirnov And Cross-Match Tests, Shubhashree Khadka

Graduate Theses and Dissertations

Human gene network is much more complex than just pairwise interaction among the genes. Zhang et al. [6] extracted microarray data from International Genomics Consortium (IGC), and presented the detection of three-way gene interactions in their paper using Fisher’s z-transformation test. Three-way gene interactions are closer than pairwise correlations in representing the complex gene structures. Additionally, it was more tractable than assessing four or more gene interactions. In this paper, we are simulating different models where Fisher’s test might not be as effective. Zhang et al.’s approach utilized Pearson’s correlation coefficients and involved detection of linear interactions only. Since gene …


Genomic And Physiological Approaches To Improve Drought Tolerance In Soybean, Avjinder Kaler Aug 2017

Genomic And Physiological Approaches To Improve Drought Tolerance In Soybean, Avjinder Kaler

Graduate Theses and Dissertations

Drought stress is a major global constraint for crop production, and improving crop tolerance to drought is of critical importance. Direct selection of drought tolerance among genotypes for yield is limited because of low heritability, polygenic control, epistasis effects, and genotype by environment interactions. Crop physiology can play a major role for improving drought tolerance through the identification of traits associated with drought tolerance that can be used as indirect selection criteria in a breeding program. Carbon isotope ratio (δ13C, associated with water use efficiency), oxygen isotope ratio (δ18O, associated with transpiration), canopy temperature (CT), canopy wilting, and canopy coverage …


A Linear-Linear Growth Model With Individual Change Point And Its Application To Ecls-K Data, Ping Zhang Aug 2017

A Linear-Linear Growth Model With Individual Change Point And Its Application To Ecls-K Data, Ping Zhang

Graduate Theses and Dissertations

The latent growth curve model with piecewise functions is a useful analytics tool to investigate the growth trajectory consisted of distinct phases of development in observed variables. An interesting feature of the growth trajectory is the time point that the trajectory changes from one phase to another one. In this thesis, we propose a simple computational pipeline to locate the change point under the linear-linear piecewise model and apply it to the longitudinal study of reading and math ability in early childhood (from kindergarten to eighth grade). In the first step, we conduct the hypothesis testing to filter out the …


A Bayesian Variable Selection Method With Applications To Spatial Data, Xiahan Tang May 2017

A Bayesian Variable Selection Method With Applications To Spatial Data, Xiahan Tang

Graduate Theses and Dissertations

This thesis first describes the general idea behind Bayes Inference, various sampling methods based on Bayes theorem and many examples. Then a Bayes approach to model selection, called Stochastic Search Variable Selection (SSVS) is discussed. It was originally proposed by George and McCulloch (1993). In a normal regression model where the number of covariates is large, only a small subset tend to be significant most of the times. This Bayes procedure specifies a mixture prior for each of the unknown regression coefficient, the mixture prior was originally proposed by Geweke (1996). This mixture prior will be updated as data becomes …


Monte Carlo Methods In Bayesian Inference: Theory, Methods And Applications, Huarui Zhang Dec 2016

Monte Carlo Methods In Bayesian Inference: Theory, Methods And Applications, Huarui Zhang

Graduate Theses and Dissertations

Monte Carlo methods are becoming more and more popular in statistics due to the fast development of efficient computing technologies. One of the major beneficiaries of this advent is the field of Bayesian inference. The aim of this thesis is two-fold: (i) to explain the theory justifying the validity of the simulation-based schemes in a Bayesian setting (why they should work) and (ii) to apply them in several different types of data analysis that a statistician has to routinely encounter. In Chapter 1, I introduce key concepts in Bayesian statistics. Then we discuss Monte Carlo Simulation methods in detail. Our …


Analysis Of Break-Points In Financial Time Series, Jean Remy Habimana Dec 2016

Analysis Of Break-Points In Financial Time Series, Jean Remy Habimana

Graduate Theses and Dissertations

A time series is a set of random values collected at equal time intervals; this randomness makes these types of series not easy to predict because the structure of the series may change at any time. As discussed in previous research, the structure of time series may change at any time due to the change in mean and/or variance of the series. Consequently, based on this structure, it is wise not to assume that these series are stationary. This paper, discusses, a method of analyzing time series by considering the entire series non-stationary, assuming there is random change in unconditional …


Statistical Modeling Of The Temporal Dynamics In A Large Scale-Citation Network, Luis Javier Ek Jr. May 2016

Statistical Modeling Of The Temporal Dynamics In A Large Scale-Citation Network, Luis Javier Ek Jr.

Graduate Theses and Dissertations

Citation Networks of papers are vast networks that grow over time. The manner or the form a citation network grows is not entirely a random process, but a preferential attachment relationship; highly cited papers are more likely to be cited by newly published papers. The result is a network whose degree distribution follows a power law. This growth of citation network of papers will be modeled with a negative binomial regression coupled with logistic growth and/or Cauchy distribution curve. Then a Barabasi-Albert model, based on the negative binomial models, and a combination of the Dirichlet distribution and multinomial will be …


Risk Estimation Toward A Natural History Model For Low Grade Glioma Patients, Anh Thi Hoang Pham May 2016

Risk Estimation Toward A Natural History Model For Low Grade Glioma Patients, Anh Thi Hoang Pham

Graduate Theses and Dissertations

Glioma is a common type of primary brain tumor that represents 28% of all brain tumors and 80% of malignant tumors. According to a recent study by the Centers for Disease Control and Prevention (CDC), gliomas account for 53%, 35% and 29% of all brain tumors (68%, 74% and 81% of malignant brain tumors) among children (aged 0-14), teenagers (aged 15-19) and young adults, respectively. Gliomas are often diagnosed through radiological imaging and histopathology. There are two main groups of gliomas following World Health Organization’s classification: Low grade gliomas (LGG), or grade I and II gliomas; and high grade gliomas …