Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 26 of 26

Full-Text Articles in Physical Sciences and Mathematics

Function Space Tensor Decomposition And Its Application In Sports Analytics, Justin Reising Dec 2019

Function Space Tensor Decomposition And Its Application In Sports Analytics, Justin Reising

Electronic Theses and Dissertations

Recent advancements in sports information and technology systems have ushered in a new age of applications of both supervised and unsupervised analytical techniques in the sports domain. These automated systems capture large volumes of data points about competitors during live competition. As a result, multi-relational analyses are gaining popularity in the field of Sports Analytics. We review two case studies of dimensionality reduction with Principal Component Analysis and latent factor analysis with Non-Negative Matrix Factorization applied in sports. Also, we provide a review of a framework for extending these techniques for higher order data structures. The primary scope of this …


Communications And Methodologies In Crime Geography: Contemporary Approaches To Disseminating Criminal Incidence And Research, Mitchell Ogden Dec 2019

Communications And Methodologies In Crime Geography: Contemporary Approaches To Disseminating Criminal Incidence And Research, Mitchell Ogden

Electronic Theses and Dissertations

Many tools exist to assist law enforcement agencies in mitigating criminal activity. For centuries, academics used statistics in the study of crime and criminals, and more recently, police departments make use of spatial statistics and geographic information systems in that pursuit. Clustering and hot spot methods of analysis are popular in this application for their relative simplicity of interpretation and ease of process. With recent advancements in geospatial technology, it is easier than ever to publicly share data through visual communication tools like web applications and dashboards. Sharing data and results of analyses boosts transparency and the public image of …


Statistical Methods For Estimating And Testing Treatment Effect For Multiple Treatment Groups In Observational Studies., Xiaofang Yan Dec 2019

Statistical Methods For Estimating And Testing Treatment Effect For Multiple Treatment Groups In Observational Studies., Xiaofang Yan

Electronic Theses and Dissertations

Note: Abstract would not save due to an issue with some of the characters.


Identifying Risk Factors Related To Premature Birth Through Binary Logistic And Proportional Odds Ordinal Logistic Regression, Clayton Elwood Aug 2019

Identifying Risk Factors Related To Premature Birth Through Binary Logistic And Proportional Odds Ordinal Logistic Regression, Clayton Elwood

Electronic Theses and Dissertations

Premature birth has been identified as the single greatest cause of death worldwide in children under the age of five. This thesis will implement binary logistic regression and proportional odds ordinal logistic regression to predict different levels of premature birth and identify associated risk factors. The models will be built from the Center for Disease Control and Prevention's 2014 Vital Statistics Natality Birth Data containing nearly 4 million live births within the United States. Odds ratios and confidence intervals on risk factors were produced utilizing binary logistic regression.


Is Corequisite Developmental Math Effective At East Tennessee State University?, Christine Padden Aug 2019

Is Corequisite Developmental Math Effective At East Tennessee State University?, Christine Padden

Electronic Theses and Dissertations

This thesis looks at the corequisite developmental math program at East Tennessee State University (ETSU) and compares the effectiveness to the previous developmental math program by comparing the student outcomes in MATH 1530. MATH 1530 is a non-calculus based statistic and probability course that satisfies most majors’ general education math requirements. ETSU sees approximately 1,000 students a year pass through MATH 1530 which is around 6.7% of the total enrollment at ETSU[9]. We are interested in the last five years of the developmental math program before it was changed to corequisite developmental math and the first five years of corequisite …


Robustness Of Semi-Parametric Survival Model: Simulation Studies And Application To Clinical Data, Isaac Nwi-Mozu Aug 2019

Robustness Of Semi-Parametric Survival Model: Simulation Studies And Application To Clinical Data, Isaac Nwi-Mozu

Electronic Theses and Dissertations

An efficient way of analyzing survival clinical data such as cancer data is a great concern to health experts. In this study, we investigate and propose an efficient way of handling survival clinical data. Simulation studies were conducted to compare performances of various forms of survival model techniques using an R package ``survsim". Models performance was conducted with varying sample sizes as small ($n5000$). For small and mild samples, the performance of the semi-parametric outperform or approximate the performance of the parametric model. However, for large samples, the parametric model outperforms the semi-parametric model. We compared the effectiveness and reliability …


Designing And Sample Size Calculation In Presence Of Heterogeneity In Biological Studies Involving High-Throughput Data., Sudhir Srivastava Aug 2019

Designing And Sample Size Calculation In Presence Of Heterogeneity In Biological Studies Involving High-Throughput Data., Sudhir Srivastava

Electronic Theses and Dissertations

The designing and determination of sample size are important for conducting high-throughput biological experiments such as proteomics experiments and RNA-Seq expression studies, thus leading to better understanding of complex mechanisms underlying various biological processes. The variations in the biological data or technical approaches to data collection lead to heterogeneity for the samples under study. We critically worked on the issues of technical and biological heterogeneity. The quantitative measurements based on liquid chromatography (LC) coupled with mass spectrometry (MS) often suffer from the problem of missing values (MVs) and data heterogeneity. We considered a proteomics data set generated from human kidney …


Novel Bayesian Methodology In Multivariate Problems., Debamita Kundu Aug 2019

Novel Bayesian Methodology In Multivariate Problems., Debamita Kundu

Electronic Theses and Dissertations

This dissertation involves developing novel Bayesian methodology for multivariate problems. In particular, it focuses on two contexts: shrinkage based variable selection in multivariate regression and simultaneous covariance estimation of multiple groups. Both these projects are centered around fully Bayesian inference schemes based on hierarchical modeling to capture context-specific features of the data and the development of computationally efficient estimation algorithm. Variable selection over a potentially large set of covariates in a linear model is quite popular. In the Bayesian context, common prior choices can lead to a posterior expectation of the regression coefficients that is a sparse (or nearly sparse) …


An Epidemiological Model With Simultaneous Recoveries, Ariel B. Farber Jun 2019

An Epidemiological Model With Simultaneous Recoveries, Ariel B. Farber

Electronic Theses and Dissertations

Epidemiological models are an essential tool in understanding how infection spreads throughout a population. Exploring the effects of varying parameters provides insight into the driving forces of an outbreak. In this thesis, an SIS (susceptible-infectious-susceptible) model is built partnering simulation methods, differential equations, and transition matrices with the intent to describe how simultaneous recoveries influence the spread of a disease in a well-mixed population. Individuals in the model transition between only two states; an individual is either susceptible — able to be infected, or infectious — able to infect others. Events in this model (infections and recoveries) occur by way …


Paper Structure Formation Simulation, Tyler R. Seekins May 2019

Paper Structure Formation Simulation, Tyler R. Seekins

Electronic Theses and Dissertations

On the surface, paper appears simple, but closer inspection yields a rich collection of chaotic dynamics and random variables. Predictive simulation of paper product properties is desirable for screening candidate experiments and optimizing recipes but existing models are inadequate for practical use. We present a novel structure simulation and generation system designed to narrow the gap between mathematical model and practical prediction. Realistic inputs to the system are preserved as randomly distributed variables. Rapid fiber placement (~1 second/fiber) is achieved with probabilistic approximation of chaotic fluid dynamics and minimization of potential energy to determine flexible fiber conformations. Resulting digital packed …


Blacklegged Tick (Ixodes Scapularis) Distribution In Maine, Usa, As Related To Climate Change, White-Tailed Deer, And The Landscape, Susan P. Elias May 2019

Blacklegged Tick (Ixodes Scapularis) Distribution In Maine, Usa, As Related To Climate Change, White-Tailed Deer, And The Landscape, Susan P. Elias

Electronic Theses and Dissertations

Lyme disease is caused by the bacterial spirochete Borrelia burgdorferi, which is transmitted through the bite of an infected blacklegged (deer) tick (Ixodes scapularis). Geographic invasion of I. scapularis in North America has been attributed to causes including 20th century reforestation and suburbanization, burgeoning populations of the white-tailed deer (Odocoileus virginianus) which is the primary reproductive host of I. scapularis, tick-associated non-native plant invasions, and climate change. Maine, USA, is a high Lyme disease incidence state, with a history of increasing I. scapularis abundance and northward range expansion. This thesis addresses the question: “To …


Generalizations Of The Arcsine Distribution, Rebecca Rasnick May 2019

Generalizations Of The Arcsine Distribution, Rebecca Rasnick

Electronic Theses and Dissertations

The arcsine distribution looks at the fraction of time one player is winning in a fair coin toss game and has been studied for over a hundred years. There has been little further work on how the distribution changes when the coin tosses are not fair or when a player has already won the initial coin tosses or, equivalently, starts with a lead. This thesis will first cover a proof of the arcsine distribution. Then, we explore how the distribution changes when the coin the is unfair. Finally, we will explore the distribution when one person has won the first …


A Comparison Of Standard Denoising Methods For Peptide Identification, Skylar Carpenter May 2019

A Comparison Of Standard Denoising Methods For Peptide Identification, Skylar Carpenter

Electronic Theses and Dissertations

Peptide identification using tandem mass spectrometry depends on matching the observed spectrum with the theoretical spectrum. The raw data from tandem mass spectrometry, however, is often not optimal because it may contain noise or measurement errors. Denoising this data can improve alignment between observed and theoretical spectra and reduce the number of peaks. The method used by Lewis et. al (2018) uses a combined constant and moving threshold to denoise spectra. We compare the effects of using the standard preprocessing methods baseline removal, wavelet smoothing, and binning on spectra with Lewis et. al’s threshold method. We consider individual methods and …


Comparison Of Imputation Methods For Mixed Data Missing At Random, Kaitlyn Heidt May 2019

Comparison Of Imputation Methods For Mixed Data Missing At Random, Kaitlyn Heidt

Electronic Theses and Dissertations

A statistician's job is to produce statistical models. When these models are precise and unbiased, we can relate them to new data appropriately. However, when data sets have missing values, assumptions to statistical methods are violated and produce biased results. The statistician's objective is to implement methods that produce unbiased and accurate results. Research in missing data is becoming popular as modern methods that produce unbiased and accurate results are emerging, such as MICE in R, a statistical software. Using real data, we compare four common imputation methods, in the MICE package in R, at different levels of missingness. The …


A Systematic Assessment Of Socio-Economic Impacts Of Prolonged Episodic Volcano Crises, Justin Peers May 2019

A Systematic Assessment Of Socio-Economic Impacts Of Prolonged Episodic Volcano Crises, Justin Peers

Electronic Theses and Dissertations

Uncertainty surrounding volcanic activity can lead to socio-economic crises with or without an eruption as demonstrated by the post-1978 response to unrest of Long Valley Caldera (LVC), CA. Extensive research in physical sciences provides a foundation on which to assess direct impacts of hazards, but fewer resources have been dedicated towards understanding human responses to volcanic risk. To evaluate natural hazard risk issues at LVC, a multi-hazard, mail-based, household survey was conducted to compare perceptions of volcanic, seismic, and wildfire hazards. Impacts of volcanic activity on housing prices and businesses were examined at the county-level for three volcanoes with a …


A Comparison Of Bayesian Estimation Techniques In A Multidimensional Two-Parameter Partial Credit Item Response Model, Peiyan Liu Jan 2019

A Comparison Of Bayesian Estimation Techniques In A Multidimensional Two-Parameter Partial Credit Item Response Model, Peiyan Liu

Electronic Theses and Dissertations

Bayesian estimation methods have shown better performance than the traditional Marginal Maximum Likelihood (MML) estimation method for parameter estimation in relatively simple item response models. However, extant literature is lacking on the investigation of Bayesian parameter estimation approaches for a multidimensional two parameter partial credit (M2PPC) model, therefore this simulation study investigated the performance of two Bayesian Markov Chain Monte Carlo (MCMC) algorithms: Gibbs Sampler and Hamiltonian Monte Carlo-No-U-Turn-Sampler (HMC-NUTS) for M2PPC models' parameter estimation. It compared the estimation accuracy and computing speed in different combinations of situations, including prior choices, test lengths, and the relationships between dimensions.

The datasets …


Finite Mixture Of Regression Models For Complex Survey Data, Abdelbaset Abdalla Jan 2019

Finite Mixture Of Regression Models For Complex Survey Data, Abdelbaset Abdalla

Electronic Theses and Dissertations

Over time, survey data has become an essential source of information for modern society. However, to be effective, the structures of survey data require sampling designs that are more complex than simple random sampling. The complex sampling data collected from enormous national surveys via these complex designs ideally include sample weights that allow analysis to take account of complicated population structures. When the target of inference is the parameters of a regression model, it is crucial to know whether these weights should be incorporated into the sampling weight when fitting the model to the survey data. The finite mixture models …


Applying Machine Learning Algorithms For The Analysis Of Biological Sequences And Medical Records, Shaopeng Gu Jan 2019

Applying Machine Learning Algorithms For The Analysis Of Biological Sequences And Medical Records, Shaopeng Gu

Electronic Theses and Dissertations

The modern sequencing technology revolutionizes the genomic research and triggers explosive growth of DNA, RNA, and protein sequences. How to infer the structure and function from biological sequences is a fundamentally important task in genomics and proteomics fields. With the development of statistical and machine learning methods, an integrated and user-friendly tool containing the state-of-the-art data mining methods are needed. Here, we propose SeqFea-Learn, a comprehensive Python pipeline that integrating multiple steps: feature extraction, dimensionality reduction, feature selection, predicting model constructions based on machine learning and deep learning approaches to analyze sequences. We used enhancers, RNA N6- methyladenosine sites and …


Development Of A Data-Driven Patient Engagement Score Using Finite Mixture Models, Eric Bae Jan 2019

Development Of A Data-Driven Patient Engagement Score Using Finite Mixture Models, Eric Bae

Electronic Theses and Dissertations

Patient activation measure (PAM) is widely adopted by health care providers to access individual's knowledge, skill, and confidence for managing one's health and healthcare. Patient activation measure (PAM), licensed by Insignia Health, is widely adopted by health care providers to access individual's knowledge, skill, and confidence for managing one's health and healthcare. Multiple studies corroborate the effectiveness of activation measure in predicting most health behaviors, including preventive behaviors, healthy behaviors, self-management behaviors, and health information seeking. However, PAM is heavily dependent on subjective patient-reported data, which are often incomplete. The purpose of this study is to develop an objective statistical …


Mefenamic Acid – Hpmc As Hg Amorphous Solid Dispersions: Dissolution Enhancement Using Hot Melt Extrusion Technology, Ashay Shukla Jan 2019

Mefenamic Acid – Hpmc As Hg Amorphous Solid Dispersions: Dissolution Enhancement Using Hot Melt Extrusion Technology, Ashay Shukla

Electronic Theses and Dissertations

Mefenamic acid, a BCS class II drug, displays high permeability and low solubility, thereby exhibiting a poor dissolution profile. Hence to improve the solubility and dissolution rate of Mefenamic acid, Hot Melt Extrusion (HME) technique was employed. The amorphous solid dispersion matrix exhibited enhanced dissolution with desired release characteristics. Hydroxypropylmethylcellulose acetate succinate (AquaSolve™ HPMC-AS HG) was used as a carrier with the poloxamer (Kolliphor P407). The drug load was varied from 20% to 40% within the blend. Drug and polymers were blended using a twin shell V-blender for 10 minutes and extruded using an 11mm twin-screw co-rotating extruder (ThermoFisher Scientific, …


Cramer Type Moderate Deviations For Random Fields And Mutual Information Estimation For Mixed-Pair Random Variables, Aleksandr Beknazaryan Jan 2019

Cramer Type Moderate Deviations For Random Fields And Mutual Information Estimation For Mixed-Pair Random Variables, Aleksandr Beknazaryan

Electronic Theses and Dissertations

In this dissertation we first study Cramer type moderate deviation for partial sums of random fields by applying the conjugate method. In 1938 Cramer published his results on large deviations of sums of i.i.d. random variables after which a lot of research has been done on establishing Cramer type moderate and large deviation theorems for different types of random variables and for various statistics. In particular results have been obtained for independent non-identically distributed random variables for the sum of independent random to estimate the mutual information between two random variables. The estimates enjoy a central limit theorem under some …


Regression Tree Construction For Reinforcement Learning Problems With A General Action Space, Anthony S. Bush Jr Jan 2019

Regression Tree Construction For Reinforcement Learning Problems With A General Action Space, Anthony S. Bush Jr

Electronic Theses and Dissertations

Part of the implementation of Reinforcement Learning is constructing a regression of values against states and actions and using that regression model to optimize over actions for a given state. One such common regression technique is that of a decision tree; or in the case of continuous input, a regression tree. In such a case, we fix the states and optimize over actions; however, standard regression trees do not easily optimize over a subset of the input variables\cite{Card1993}. The technique we propose in this thesis is a hybrid of regression trees and kernel regression. First, a regression tree splits over …


Variable Selection In Accelerated Failure Time (Aft) Frailty Models: An Application Of Penalized Quasi-Likelihood, Sarbesh R. Pandeya Jan 2019

Variable Selection In Accelerated Failure Time (Aft) Frailty Models: An Application Of Penalized Quasi-Likelihood, Sarbesh R. Pandeya

Electronic Theses and Dissertations

Variable selection is one of the standard ways of selecting models in large scale datasets. It has applications in many fields of research study, especially in large multi-center clinical trials. One of the prominent methods in variable selection is the penalized likelihood, which is both consistent and efficient. However, the penalized selection is significantly challenging under the influence of random (frailty) covariates. It is even more complicated when there is involvement of censoring as it may not have a closed-form solution for the marginal log-likelihood. Therefore, we applied the penalized quasi-likelihood (PQL) approach that approximates the solution for such a …


Safety Constraint Optimization Of Combination Drug Therapy In Hypertension Clinical Trials, Victor Chukwu Jan 2019

Safety Constraint Optimization Of Combination Drug Therapy In Hypertension Clinical Trials, Victor Chukwu

Electronic Theses and Dissertations

In Clinical Practice, combination drug therapy has become common in treating many disease conditions. The purpose of these combinations is often to ensure optimal efficacy and to reduce adverse effects that may arise from monotherapy. Clinical trials have also been conducted to ensure efficacy and safety of these combinations before they are introduced into the market. However, adverse effects still occur with combination therapies. The objective of this study is to (1) To determine a region of optimum doses of Drug A and Drug B in combination while focusing on efficacy alone (2) To determine a region of optimum doses …


Essays On Mixture Models, Trevor R. Camper Jan 2019

Essays On Mixture Models, Trevor R. Camper

Electronic Theses and Dissertations

When considering statistical scenarios where one can sample from populations that are not of interest for the purposes of a study, bivariate mixture models can be used to study the effect that this missampling can have on parameter estimation. In this thesis, we will examine the behavior that bivariate mixture models have on two statistical constructs: Cronbach's alpha \cite{C51}, and Spearman's rho \cite{S04}. Chapter 1 will introduce notions of mixture models and the definition of bias under mixture models which will serve as the central concept of this thesis. Chapter 2 will investigate a particular psychometric issue known as insufficient …


Data Patterns Discovery Using Unsupervised Learning, Rachel A. Lewis Jan 2019

Data Patterns Discovery Using Unsupervised Learning, Rachel A. Lewis

Electronic Theses and Dissertations

Self-care activities classification poses significant challenges in identifying children’s unique functional abilities and needs within the exceptional children healthcare system. The accuracy of diagnosing a child's self-care problem, such as toileting or dressing, is highly influenced by an occupational therapists’ experience and time constraints. Thus, there is a need for objective means to detect and predict in advance the self-care problems of children with physical and motor disabilities. We use clustering to discover interesting information from self-care problems, perform automatic classification of binary data, and discover outliers. The advantages are twofold: the advancement of knowledge on identifying self-care problems in …