Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Theses/Dissertations

Statistics and Probability

2019

Institution
Keyword
Publication

Articles 1 - 30 of 253

Full-Text Articles in Physical Sciences and Mathematics

Inference Of Heterogeneity In Meta-Analysis Of Rare Binary Events And Rss-Structured Cluster Randomized Studies, Chiyu Zhang Dec 2019

Inference Of Heterogeneity In Meta-Analysis Of Rare Binary Events And Rss-Structured Cluster Randomized Studies, Chiyu Zhang

Statistical Science Theses and Dissertations

This dissertation contains two topics: (1) A Comparative Study of Statistical Methods for Quantifying and Testing Between-study Heterogeneity in Meta-analysis with Focus on Rare Binary Events; (2) Estimation of Variances in Cluster Randomized Designs Using Ranked Set Sampling.

Meta-analysis, the statistical procedure for combining results from multiple studies, has been widely used in medical research to evaluate intervention efficacy and safety. In many practical situations, the variation of treatment effects among the collected studies, often measured by the heterogeneity parameter, may exist and can greatly affect the inference about effect sizes. Comparative studies have been done for only one or …


Statistical Analysis Of Social Network Change, Teresa Danielle Schmidt Dec 2019

Statistical Analysis Of Social Network Change, Teresa Danielle Schmidt

Dissertations and Theses

This project explores two statistical methods that infer social network structures and statistically test those structures for change over time: regression-based differential network analysis (R-DNA) and information theory-based differential analysis (I-DNA). R-DNA is adapted from bioinformatics and I-DNA employs reconstructability analysis.

This project applies both R-DNA and I-DNA to analyze Medicaid claims data from one-year periods before (May 2011- Apr 2012) and after (Jan 2013-Dec 2013) the formation of the Health Share of Oregon Coordinated Care Organization (CCO). The formation of CCOs was legislated by the state of Oregon in 2012 with the triple aim of improving health outcomes, reducing …


Ordinal Hyperplane Loss, Bob Vanderheyden Dec 2019

Ordinal Hyperplane Loss, Bob Vanderheyden

Doctor of Data Science and Analytics Dissertations

This research presents the development of a new framework for analyzing ordered class data, commonly called “ordinal class” data. The focus of the work is the development of classifiers (predictive models) that predict classes from available data. Ratings scales, medical classification scales, socio-economic scales, meaningful groupings of continuous data, facial emotional intensity and facial age estimation are examples of ordinal data for which data scientists may be asked to develop predictive classifiers. It is possible to treat ordinal classification like any other classification problem that has more than two classes. Specifying a model with this strategy does not fully utilize …


On Improving Performance Of The Binary Logistic Regression Classifier, Michael Chang Dec 2019

On Improving Performance Of The Binary Logistic Regression Classifier, Michael Chang

UNLV Theses, Dissertations, Professional Papers, and Capstones

Logistic Regression, being both a predictive and an explanatory method, is one of the most commonly used statistical and machine learning method in almost all disciplines. There are many situations, however, when the accuracies of the fitted model are low for predicting either the success event or the failure event. Several statistical and machine learning approaches exist in the literature to handle these situations. This thesis presents several new approaches to improve the performance of the fitted model, and the proposed methods have been applied to real datasets.

Transformations of predictors is a common approach in fitting multiple linear and …


Evaluating Parents Sociodemographic Factors And Childhood Vaccine Decisions, Mehret Girmay Dec 2019

Evaluating Parents Sociodemographic Factors And Childhood Vaccine Decisions, Mehret Girmay

UNLV Theses, Dissertations, Professional Papers, and Capstones

Vaccination is considered one of the most successful public health achievements of the 20th century. However, with increasing vaccine skepticism emerging over the past decades, there is a threat to the ongoing sustainment of vaccine coverage within all US communities. This study evaluated and compared parents’ sociodemographic factors associated with childhood vaccine decisions. This study is a secondary analysis of 893 parents/guardians, age 18-55 years with child(ren) < 7 years living in the U.S.

Predictive analysis was conducted using multinomial logistic regression modeling was used to examine vaccine decisions (accept, hesitant, and refuse) in relation to parents’ sociodemographic factors. Overall, (66.6%) of parents accepted recommended vaccines, while …


A Pedagogic Analysis Of Linear Algebra Courses, Andrew Taylor Dec 2019

A Pedagogic Analysis Of Linear Algebra Courses, Andrew Taylor

Mathematics & Statistics ETDs

This project is concerned with investigating the question, "Do our applied linear algebra courses (at the University of New Mexico) adequately prepare STEM students for future work in their respective fields?" In order to explore this, surveys were issued to three groups (sections) of students (among two different instructors) at the conclusion of their applied linear algebra course, as well as STEM professors/instructors from a variety of STEM fields. Students were surveyed regarding their perceived mastery of given topics/ideas from the course and professors/instructors were surveyed about the level of mastery they felt was necessary (referred to as ``desired mastery") …


The Negotiator's Role In A Buyer-Seller Game, Joseph Gaudy Dec 2019

The Negotiator's Role In A Buyer-Seller Game, Joseph Gaudy

Graduate Theses and Capstone Projects (excluding DNP)

In game theory, buyer-seller games rarely utilize a negotiating third party. Any negotiations are typically conducted by the buyer and seller. This study, motivated by the real estate market, uses sequentially and simultaneously played game models to explore the influence a self-interested, negotiating, third party has on player payoffs. For the sequential model, a game tree is utilized to demonstrate player actions, preferences, and outcomes. The weak sequential equilibrium is calculated using Gambit[1] and shows optimality in player payoffs to exist when the seller’s and realtor’s strategies align according to the current market. For the simultaneous model, expected payoff functions …


Evaluation Of Relationship Between Lead-Dust Loading, Lead-Dust Concentration, And Total Dust Loading Metrics Across Multiple Data Sets, Charles Bevington Dec 2019

Evaluation Of Relationship Between Lead-Dust Loading, Lead-Dust Concentration, And Total Dust Loading Metrics Across Multiple Data Sets, Charles Bevington

Capstone Experience

Lead-dust monitoring studies report values as either lead-dust loadings µg/ft2 or as lead-dust concentrations µg/g. It is rare for studies to report both metrics. When only lead-dust loading values are present, professionals require an approach to estimate lead-dust concentration values. A literature search identified five studies that contained raw data for both lead-dust loading and lead-dust concentration. An additional thirty-two studies had summary-statistics available for both lead-dust loading and lead-dust concentration. Studies with raw-data were used to develop an empirically-based loading to concentration statistical relationship. Raw data sets were critically evaluated to determine whether elimination or …


Communications And Methodologies In Crime Geography: Contemporary Approaches To Disseminating Criminal Incidence And Research, Mitchell Ogden Dec 2019

Communications And Methodologies In Crime Geography: Contemporary Approaches To Disseminating Criminal Incidence And Research, Mitchell Ogden

Electronic Theses and Dissertations

Many tools exist to assist law enforcement agencies in mitigating criminal activity. For centuries, academics used statistics in the study of crime and criminals, and more recently, police departments make use of spatial statistics and geographic information systems in that pursuit. Clustering and hot spot methods of analysis are popular in this application for their relative simplicity of interpretation and ease of process. With recent advancements in geospatial technology, it is easier than ever to publicly share data through visual communication tools like web applications and dashboards. Sharing data and results of analyses boosts transparency and the public image of …


A Signature Enrichment Design With Bayesian Adaptive Randomization For Cancer Clinical Trials, Fang Xia Dec 2019

A Signature Enrichment Design With Bayesian Adaptive Randomization For Cancer Clinical Trials, Fang Xia

Dissertations & Theses (Open Access)

Clinical trials in the era of precision medicine demand more flexible and efficient trial designs. Adaptive clinical trial designs allow pre-specified modifications of an on-going clinical trial and could shorten the trial duration. We reviewed five common types of adaptive clinical trials based on adaptation methods. In particular, outcome-randomization becomes more popular as it can assign more patients to the promising treatments based on the accumulated trial data. This data-driven allocation allows more patients to benefit from the trial, which is especially important for cancer patients. We compared different Bayesian outcome-adaptive randomization methods and discussed them from both methodological and …


Function Space Tensor Decomposition And Its Application In Sports Analytics, Justin Reising Dec 2019

Function Space Tensor Decomposition And Its Application In Sports Analytics, Justin Reising

Electronic Theses and Dissertations

Recent advancements in sports information and technology systems have ushered in a new age of applications of both supervised and unsupervised analytical techniques in the sports domain. These automated systems capture large volumes of data points about competitors during live competition. As a result, multi-relational analyses are gaining popularity in the field of Sports Analytics. We review two case studies of dimensionality reduction with Principal Component Analysis and latent factor analysis with Non-Negative Matrix Factorization applied in sports. Also, we provide a review of a framework for extending these techniques for higher order data structures. The primary scope of this …


Using A Discrete Choice Experiment To Estimate Willingness To Pay For Location Based Housing Attributes, Kristopher C. Toll Dec 2019

Using A Discrete Choice Experiment To Estimate Willingness To Pay For Location Based Housing Attributes, Kristopher C. Toll

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

In 1993, a travel study was conducted along the Wasatch front in Utah (Research Systems Group INC, 2013). The main purpose of this study was to assess travel behavior to understand the needs for future growth in Utah. Since then, the Research Service Group (RSG), conducted a new study in 2012 to understand current travel preferences in Utah. This survey, called the Residential Choice Stated Preference survey, asked respondents to make ten choice comparisons between two hypothetical homes. Each home in the choice comparison was described by different attributes, those attributes that were used are, type of neighborhood, distance from …


Detecting Differentially Co-Expressed Gene Modules Via The Edge-Count Test, Anne Gratius Lin Dec 2019

Detecting Differentially Co-Expressed Gene Modules Via The Edge-Count Test, Anne Gratius Lin

Graduate Theses and Dissertations

Background

Gene expression profiling by microarray has been used to uncover molecular variations in many different diseases. Complementary to conventional differential expression analysis, differential co-expression analysis can identify gene markers from the systematic and granular level. There are three aspects for differential co-expression network analysis, including the network global topological comparison, differential co-expression cluster identification, and differential co-expressed genes and gene pair identification. To date, most of the methods available still rely on Pearson’s correlation coefficient despite its nonlinear insensitivity.

Results

Here we present an approach that is robust to nonlinearity by using the edge-count test for differential co-expression analysis. …


Statistical Methods For Estimating And Testing Treatment Effect For Multiple Treatment Groups In Observational Studies., Xiaofang Yan Dec 2019

Statistical Methods For Estimating And Testing Treatment Effect For Multiple Treatment Groups In Observational Studies., Xiaofang Yan

Electronic Theses and Dissertations

Note: Abstract would not save due to an issue with some of the characters.


Seasonal Time Series Models With Application To Weather And Lake Level Data, Mengqing Qin Dec 2019

Seasonal Time Series Models With Application To Weather And Lake Level Data, Mengqing Qin

MSU Graduate Theses

This work studies seasonal time series models with application to lake level and weather data. The thesis includes related time series concepts, integrated autoregressive moving average models (abbreviated as ARIMA), parameter estimation, model diagnostics, and forecasting. The studied time series models are applied to the data of daily lake level in Beaver Lake (1988-2017) and the data of daily maximum temperature in New York Central Park (1870-2017). Due to seasonality of the data, three different approaches are proposed to the modeling: regression method, functional ARIMA method and multiplicative seasonal ARIMA method. The forecasted values of the year 2018 are compared …


Habitat Associations And Reproduction Of Fishes On The Northwestern Gulf Of Mexico Shelf Edge, Elizabeth Marie Keller Nov 2019

Habitat Associations And Reproduction Of Fishes On The Northwestern Gulf Of Mexico Shelf Edge, Elizabeth Marie Keller

LSU Doctoral Dissertations

Several of the northwestern Gulf of Mexico (GOM) shelf-edge banks provide critical hard bottom habitat for coral and fish communities, supporting a wide diversity of ecologically and economically important species. These sites may be fish aggregation and spawning sites and provide important habitat for fish growth and reproduction. Already designated as habitat areas of particular concern, many of these banks are also under consideration for inclusion in the expansion of the Flower Garden Banks National Marine Sanctuary. This project aimed to gain a more comprehensive understanding of the communities and fish species on shelf-edge banks by way of gonad histology, …


Fractional Random Weighted Bootstrapping For Classification On Imbalanced Data With Ensemble Decision Tree Methods, Sean Charles Carter Nov 2019

Fractional Random Weighted Bootstrapping For Classification On Imbalanced Data With Ensemble Decision Tree Methods, Sean Charles Carter

USF Tampa Graduate Theses and Dissertations

Ensemble methods are commonly used for building predictive models for classification. Models that are unstable to perturbations in the training set, such as the decision tree, often see considerable reductions in error when grouped, using bootstrapped resamples of the training data to train many models. The non-parametric bootstrap, however, has limited efficacy when used on severely imbalanced data, especially when the number of observations of one or more classes is exceptionally small. We explore the fractional random weighted bootstrap, which randomly assigns fractional weights to observations, as an alternative resampling pro cedure in training machine learning ensembles, particularly decision tree …


On The Sparre-Andersen Risk Models, Ruixi Zhang Oct 2019

On The Sparre-Andersen Risk Models, Ruixi Zhang

Electronic Thesis and Dissertation Repository

This thesis develops several strategies for calculating ruin-related quantities for a variety of extended risk models. We focus on the Sparre-Andersen risk model, also known as the renewal risk model. The idea of arbitrary distribution for the waiting time between claim payments arose in the 1950’s from the collective risk theory, and received many extensions and modifications in recent years. Our goal is to tackle model assumptions that are either too relaxed for traditional methods to apply, or so complicated that elaborate algebraic tools are needed to obtain explicit solutions.

In Chapter 2, we consider a Lévy risk process and …


Joint Asymptotics For Smoothing Spline Semiparametric Nonlinear Models, Jiahui Yu Oct 2019

Joint Asymptotics For Smoothing Spline Semiparametric Nonlinear Models, Jiahui Yu

Doctoral Dissertations

We study the joint asymptotics of general smoothing spline semiparametric models in the settings of density estimation and regression. We provide a systematic framework which incorporates many existing models as special cases, and further allows for nonlinear relationships between the finite-dimensional Euclidean parameter and the infinite-dimensional functional parameter. For both density estimation and regression, we establish the local existence and uniqueness of the penalized likelihood estimators for our proposed models. In the density estimation setting, we prove joint consistency and obtain the rates of convergence of the joint estimator in an appropriate norm. The convergence rate of the parametric component …


Characterization Of The Anomalous Ph Of Aqueous Nanoemulsions, Kieran P. Ramos Oct 2019

Characterization Of The Anomalous Ph Of Aqueous Nanoemulsions, Kieran P. Ramos

Doctoral Dissertations

Aqueous water-in-oil nanoemulsions have emerged as a versatile tool for use in microfluidics, drug delivery, single-molecule measurements, and other research. Nanoemulsions are often prepared with perfluorocarbons which are remarkably biocompatbile due to their stability, low surface tension, lipophobicity, and hydrophobicity. Therefore it is often assumed that droplet contents are unperturbed by the perfluorinated surface. However, in microemulsions, which are similar to nanoemulsions, it is known that either the pH of the aqueous phase or the ionization constants of encapsulated molecules are different from bulk solution. There is also recent evidence of low pH in perfluorinated aqueous nanoemulsions. The current underlying …


Function And Dissipation In Finite State Automata - From Computing To Intelligence And Back, Natesh Ganesh Oct 2019

Function And Dissipation In Finite State Automata - From Computing To Intelligence And Back, Natesh Ganesh

Doctoral Dissertations

Society has benefited from the technological revolution and the tremendous growth in computing powered by Moore's law. However, we are fast approaching the ultimate physical limits in terms of both device sizes and the associated energy dissipation. It is important to characterize these limits in a physically grounded and implementation-agnostic manner, in order to capture the fundamental energy dissipation costs associated with performing computing operations with classical information in nano-scale quantum systems. It is also necessary to identify and understand the effect of quantum in-distinguishability, noise, and device variability on these dissipation limits. Identifying these parameters is crucial to designing …


Model-Form Uncertainty Quantification For Predictive Probabilistic Graphical Models, Jinchao Feng Oct 2019

Model-Form Uncertainty Quantification For Predictive Probabilistic Graphical Models, Jinchao Feng

Doctoral Dissertations

In this thesis, we focus on Uncertainty Quantification and Sensitivity Analysis, which can provide performance guarantees for predictive models built with both aleatoric and epistemic uncertainties, as well as data, and identify which components in a model have the most influence on predictions of our quantities of interest. In the first part (Chapter 2), we propose non-parametric methods for both local and global sensitivity analysis of chemical reaction models with correlated parameter dependencies. The developed mathematical and statistical tools are applied to a benchmark Langmuir competitive adsorption model on a close packed platinum surface, whose parameters, estimated from quantum-scale computations, …


Statistical Modeling And Characterization Of Induced Seismicity Within The Western Canada Sedimentary Basin, Sid Kothari Oct 2019

Statistical Modeling And Characterization Of Induced Seismicity Within The Western Canada Sedimentary Basin, Sid Kothari

Electronic Thesis and Dissertation Repository

In western Canada, there has been an increase in seismic activity linked to anthropogenic energy-related operations including conventional hydrocarbon production, wastewater fluid injection and more recently hydraulic fracturing (HF). Statistical modeling and characterization of the space, time and magnitude distributions of the seismicity clusters is vital for a better understanding of induced earthquake processes and development of predictive models. In this work, a statistical analysis of the seismicity in the Western Canada Sedimentary Basin was performed across past and present time periods by utilizing a compiled earthquake catalogue for Alberta and eastern British Columbia. Specifically, the frequency-magnitude statistics were analyzed …


Adaptive Feature Engineering Modeling For Ultrasound Image Classification For Decision Support, Hatwib Mugasa Oct 2019

Adaptive Feature Engineering Modeling For Ultrasound Image Classification For Decision Support, Hatwib Mugasa

Doctoral Dissertations

Ultrasonography is considered a relatively safe option for the diagnosis of benign and malignant cancer lesions due to the low-energy sound waves used. However, the visual interpretation of the ultrasound images is time-consuming and usually has high false alerts due to speckle noise. Improved methods of collection image-based data have been proposed to reduce noise in the images; however, this has proved not to solve the problem due to the complex nature of images and the exponential growth of biomedical datasets. Secondly, the target class in real-world biomedical datasets, that is the focus of interest of a biopsy, is usually …


Time Series Analysis Of Weather Data In South Carolina, Geophrey Odero Oct 2019

Time Series Analysis Of Weather Data In South Carolina, Geophrey Odero

Theses and Dissertations

This thesis discusses time series analysis of weather data in South Carolina for the last fifteen years (January 2003 to December 2017) for Columbia, Greenville and North Myrtle Beach. The first part presents a brief overview of different variables that are used in the analysis. That is, temperature, dew point, humidity and sea level pressure. A short discussion of time series data is also introduced. The second part is about modeling the variables. The models of choice are presented, fitted and model diagnostics is carried out. In the third part, we discuss background on climates of the cities and model …


Probabilistic Modeling Of Democracy, Corruption, Hemophilia A And Prediabetes Data, A. K. M. Raquibul Bashar Sep 2019

Probabilistic Modeling Of Democracy, Corruption, Hemophilia A And Prediabetes Data, A. K. M. Raquibul Bashar

USF Tampa Graduate Theses and Dissertations

Parametric analysis of any real-world data is the most powerful tool to characterize the probabilistic behavior in social, economic, medical, epidemiological, and other areas of study. In the present study, we identify the theoretical Probability Distribution Function(PDF) for Democracy Index Scores (DIS) from the Economist Intelligence Unit (EIU) database and estimate the maximum likelihood estimates of the theoretical PDFS. We also identify the individual PDFs for each of the clusters, Full Democracy, Flawed Democracy, Hybrid Regime, and Authoritarian Regime defined by the Economist Intelligence Unit (EIU).

A statistical model is a convenient instrument to predict the future value of any …


Statistical L-Moment And L-Moment Ratio Estimation And Their Applicability In Network Analysis, Timothy S. Anderson Sep 2019

Statistical L-Moment And L-Moment Ratio Estimation And Their Applicability In Network Analysis, Timothy S. Anderson

Theses and Dissertations

This research centers on finding the statistical moments, network measures, and statistical tests that are most sensitive to various node degradations for the Barabási-Albert, Erdös-Rényi, and Watts-Strogratz network models. Thirty-five different graph structures were simulated for each of the random graph generation algorithms, and sensitivity analysis was undertaken on three different network measures: degree, betweenness, and closeness. In an effort to find the statistical moments that are the most sensitive to degradation within each network, four traditional moments: mean, variance, skewness, and kurtosis as well as three non-traditional moments: L-variance, L-skewness, and L-kurtosis were examined. Each of these moments were …


Semi-Supervised Regression With Generative Adversarial Networks Using Minimal Labeled Data, Greg Olmschenk Sep 2019

Semi-Supervised Regression With Generative Adversarial Networks Using Minimal Labeled Data, Greg Olmschenk

Dissertations, Theses, and Capstone Projects

This work studies the generalization of semi-supervised generative adversarial networks (GANs) to regression tasks. A novel feature layer contrasting optimization function, in conjunction with a feature matching optimization, allows the adversarial network to learn from unannotated data and thereby reduce the number of labels required to train a predictive network. An analysis of simulated training conditions is performed to explore the capabilities and limitations of the method. In concert with the semi-supervised regression GANs, an improved label topology and upsampling technique for multi-target regression tasks are shown to reduce data requirements. Improvements are demonstrated on a wide variety of vision …


Sample Size Requirements And Considerations For Models To Assess Human-Machine System Performance, Jennifer S. G. Lopez Sep 2019

Sample Size Requirements And Considerations For Models To Assess Human-Machine System Performance, Jennifer S. G. Lopez

Theses and Dissertations

Hierarchical Linear Models (HLMs), also known as multi-level models, are an extension of multiple regression analysis and can aid in the understanding of human and machine workloads of a system. These models allow for prediction and testing in systems with hierarchies of two or more levels. The complex interrelated variability of these multi-level models exists in operational settings, such as the Air Force Distributed Common Ground System Full Motion Video (AF DCGS FMV) community which is composed of individuals (Level-1), groups (Level-2), units (Level-3), and organizations (Level-4). Through the development of sample size requirements and considerations for multi-level models, this …


Towards Using Model Averaging To Construct Confidence Intervals In Logistic Regression Models, Artem Uvarov Aug 2019

Towards Using Model Averaging To Construct Confidence Intervals In Logistic Regression Models, Artem Uvarov

Electronic Thesis and Dissertation Repository

Regression analyses in epidemiological and medical research typically begin with a model selection process, followed by inference assuming the selected model has generated the data at hand. It is well-known that this two-step procedure can yield biased estimates and invalid confidence intervals for model coefficients due to the uncertainty associated with the model selection. To account for this uncertainty, multiple models may be selected as a basis for inference. This method, commonly referred to as model-averaging, is increasingly becoming a viable approach in practice.

Previous research has demonstrated the advantage of model-averaging in reducing bias of parameter estimates. However, there …