Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability

Institution
Keyword
Publication Year
Publication
Publication Type
File Type

Articles 631 - 660 of 13356

Full-Text Articles in Physical Sciences and Mathematics

Theoretical And Computational Aspects Of Robust Cluster Analysis For Multivariate And High-Dimensional Datasets, Andrews Tawiah Anum May 2023

Theoretical And Computational Aspects Of Robust Cluster Analysis For Multivariate And High-Dimensional Datasets, Andrews Tawiah Anum

Open Access Theses & Dissertations

Multivariate and high-dimensional datasets typically contain subgroups that may not be immediately apparent. To reveal these groups, cluster analysis is performed. Cluster analysis is an unsupervised machine learning technique commonly employed to partition a dataset into distinct categories referred to as clusters. The k-means algorithm is a prominent distance-based clustering method. Despite overwhelming popularity, the algorithm is not invariant under non-singular affine transformations and is not robust, i.e., can be unduly influenced by outliers. To address these deficiencies, we propose an alternative model-based clustering procedure by minimizing a “trimmed” variant of the negative log-likelihood function. We develop a “concentration step”, …


Flexible Models For The Estimation Of Treatment Effect, Habeeb Abolaji Bashir May 2023

Flexible Models For The Estimation Of Treatment Effect, Habeeb Abolaji Bashir

Open Access Theses & Dissertations

Estimation of treatment effect is an important problem which is well studied in the literature. While the regression models are one of the most commonly used techniques for the estimation of treatment effect, they are prone to model misspecification. To minimize the model misspecification bias, flexible nonparametric models are introduced for the estimation. Continuing this line of research, we propose two flexible nonparametric models that allow the treatment effect to vary across different levels of covariates. We provide estimation algorithms for both these models. Using simulations and data analysis, we illustrate the usefulness of the proposed methods.


Developing A Risk Assessment Instrument For Immigration Cases Under Federal Supervision, Mayra Eydie Pacheco May 2023

Developing A Risk Assessment Instrument For Immigration Cases Under Federal Supervision, Mayra Eydie Pacheco

Open Access Theses & Dissertations

No abstract provided.


Outlier Detection In Multivariate And High-Dimensional Datasets, Yuanhong Wu May 2023

Outlier Detection In Multivariate And High-Dimensional Datasets, Yuanhong Wu

Open Access Theses & Dissertations

Accurate detection of outliers is crucial in the field of statistical analysis. Using classical statisticalmodels without considering the presence of outliers in the data can lead to misleading outcomes. There exist a myriad of procedures to detect outliers in statistics. We concentrate on the statistical techniques that can robustly identify outliers in data sets. To this end, we pursue two aims. First, we give an extensive overview of robust statistical methods which are still popular in recent years for outlier detection. We provide the definitions, algorithms and also discuss some important properties of these methods. Second, two real examples are …


Spatially Adaptive Estimation Of Spectrum, Yi Xie May 2023

Spatially Adaptive Estimation Of Spectrum, Yi Xie

Open Access Theses & Dissertations

A time series may be analyzed either in the time or in the frequency domain. When working in the frequency domain, the main objective is to estimate the underlying spectrum. Various approaches have been proposed to this end, but most are based on smoothing the periodogram using a single smoothing parameter across all Fourier frequencies. Such a global smoothing parameter may result in a biased estimate. To improve the estimation, in this paper, we smooth the log periodogram by placing a dynamic shrinkage prior, such that varying degrees of smoothing may be applied to different regions of the Fourier frequencies, …


Performance Classification Of Ornstein-Uhlenbeck-Type Models Using Fractal Analysis Of Time Series Data., Peter Kwadwo Asante May 2023

Performance Classification Of Ornstein-Uhlenbeck-Type Models Using Fractal Analysis Of Time Series Data., Peter Kwadwo Asante

Open Access Theses & Dissertations

This dissertation aims to assess the performance of Ornstein-Uhlenbeck-type models by examining the fractal characteristics of time series data from various sources, including finance, volcanic and earthquake events, US COVID-19 reported cases and deaths, and two simulated time series with differing properties. The time series data is categorized as either a Gaussian or a Lévy process (Lévy walk or Lévy flight) by using three scaling methods: Rescaled range analysis, Detrended fluctuation analysis, and Diffusion entropy analysis. The outcomes of this analysis indicate that the financial indices are classified as Lévy walks, while the volcanic, earthquake, and COVID-19 data are classified …


Non-Destructive Imaging Of Phytosulfokine Trafficking In Plants Using Fiber-Optic Fluorescence Microscopy, Bernard Abakah May 2023

Non-Destructive Imaging Of Phytosulfokine Trafficking In Plants Using Fiber-Optic Fluorescence Microscopy, Bernard Abakah

Electronic Theses and Dissertations

Plants secrete peptide ligands and use receptor signaling to respond to stress and control development. Understanding these phenomena is key to improving plant health and productivity for food, fiber, and energy applications. Phytosulfokine (PSK), a sulfated peptide hormone, regulates plant cell division, growth, and stress tolerance via specific phytosulfokine receptors (PSKRs). This study uses fiber-optic fluorescence microscopy to elucidate trafficking of PSK in live plants. The microscope features two-color optics and an objective lens connected to a 1-m coherent imaging fiber mounted on either a conventional upright microscope body or 5-axis positioning system (X–Y–Z plus pitch and yaw). PSK and …


Generalized Additive Model Using Marginal Integration Estimation Techniques With Interactions, Tahiru Mahama May 2023

Generalized Additive Model Using Marginal Integration Estimation Techniques With Interactions, Tahiru Mahama

Open Access Theses & Dissertations

Marginal Integration (MI) is a statistical method that is extensively employed to estimatecomponent functions of the nonparametric additive models. The shortcoming of the purely additive model is that interaction between predictor variables is often ignored, and it may produce poor performance in some real applications. As a result, this research considers the second-order interactions in the regression models. The primary objective is to use marginal integration techniques to estimate the nonparametric additive functions. We compare this model with other models/estimators such as the Generalized Additive Model (GAM), Generalized Additive Model with Selection (GAMSEL), Robust Marginal Integration (RMI), Ordinary Least Squares …


Examining The Effect Of Word Embeddings And Preprocessing Methods On Fake News Detection, Jessica Hauschild May 2023

Examining The Effect Of Word Embeddings And Preprocessing Methods On Fake News Detection, Jessica Hauschild

Department of Statistics: Dissertations, Theses, and Student Work

The words people choose to use hold a lot of power, whether that be in spreading truth or deception. As listeners and readers, we do our best to understand how words are being used. There are many current methods in computer science literature attempting to embed words into numerical information for statistical analyses. Some of these embedding methods, such as Bag of Words, treat words as independent, while others, such as Word2Vec, attempt to gain information about the context of words. It is of interest to compare how well these various methods of translating text into numerical data work specifically …


Effects Of Land Use On Soil Microbial Communities In Tropical Montane Forests Of Malaysian Borneo, Yang Kai Tang May 2023

Effects Of Land Use On Soil Microbial Communities In Tropical Montane Forests Of Malaysian Borneo, Yang Kai Tang

Graduate Theses and Dissertations

Land use, such as logging and forest conversion to agriculture, can modify soil physicochemical and biological properties, and affect soil health. To understand how land use change can impact soil properties and canopy structure, we used a land use gradient in Malaysian Borneo consisting of six sites, including old growth forests, mixed forests, and agriculture fields. Specifically, we aimed to answer the following questions: (1) How do soil physicochemical properties vary across land use types? (2) Does bacterial diversity and composition vary across different land use types? (3) Does fungal diversity and composition vary across different land use types? We …


Gentrification And Crime In The Twin Cities: Insights And Challenges Through A Statistical Lens, Erin G. Franke May 2023

Gentrification And Crime In The Twin Cities: Insights And Challenges Through A Statistical Lens, Erin G. Franke

Mathematics, Statistics, and Computer Science Honors Projects

Gentrification is a complex process of urban redevelopment that typically involves an in-migration of educated people to neighborhoods experiencing a period of disinvestment. While gentrification is widely regarded for its potential to displace long-time businesses and residents of the neighborhood, its impact on crime is highly controversial. There is not a consensus on the relationship between gentrification and crime across criminological theory and past statistical studies have also shown contradictory results. Measuring gentrification on the tract level with census data, we seek to understand gentrification’s relationship with violent crime and theft in the Twin Cities. Using a Poisson model with …


A Brascamp-Lieb–Rary Of Examples, Anina Peersen May 2023

A Brascamp-Lieb–Rary Of Examples, Anina Peersen

Mathematics, Statistics, and Computer Science Honors Projects

This paper focuses on the Brascamp-Lieb inequality and its applications in analysis, fractal geometry, computer science, and more. It provides a beginner-level introduction to the Brascamp-Lieb inequality alongside re- lated inequalities in analysis and explores specific cases of extremizable, simple, and equivalent Brascamp-Lieb data. Connections to computer sci- ence and geometric measure theory are introduced and explained. Finally, the Brascamp-Lieb constant is calculated for a chosen family of linear maps.


The Last Drought Frontier: Building A Drought Index For The State Of Alaska, Olivia Campbell May 2023

The Last Drought Frontier: Building A Drought Index For The State Of Alaska, Olivia Campbell

School of Natural Resources: Dissertations, Theses, and Student Research

Drought is characterized by periods of below average precipitation. There are five major types of drought recognized in the literature: meteorological, hydrological, agricultural, socioeconomic, and ecological. A relatively new concept in the drought literature is “snow drought.” A key part of the definition of drought is that it is not always accompanied by extreme heat. This means drought can occur even in cold climates, cold seasons, and higher latitudes and altitudes, like Alaska. Drought is a natural part of climate variability, but Alaska’s climate is changing faster than any other state in the United States. Alaska is no stranger to …


An Analysis Of Changes In Seasonal Dynamics And Generational Differences In The Maine Lobster Fishery, Emily Fitting May 2023

An Analysis Of Changes In Seasonal Dynamics And Generational Differences In The Maine Lobster Fishery, Emily Fitting

Electronic Theses and Dissertations

The American lobster (Homarus americanus) supports the most valuable single species fishery in the US. Lobster landings have been increasing steadily for the last three decades, but before that landings were more variable. The high value of the lobster fishery combined with the decline of other commercially important species in this region has created increasing dependence on the resource, and previous research questions the resilience of the fishery in the face of social and environmental changes.

Important lobster life history processes, including migration patterns, growth rates, and reproduction, are driven by ocean bottom temperature, which creates a strong seasonal cycle …


Mixing Measures For Trees Of Fixed Diameter, Ari Holcombe Pomerance May 2023

Mixing Measures For Trees Of Fixed Diameter, Ari Holcombe Pomerance

Mathematics, Statistics, and Computer Science Honors Projects

A mixing measure is the expected length of a random walk in a graph given a set of starting and stopping conditions. We determine the tree structures of order n with diameter d that minimize and maximize for a few mixing measures. We show that the maximizing tree is usually a broom graph or a double broom graph and that the minimizing tree is usually a seesaw graph or a double seesaw graph.


Uconn Baseball Batting Order Optimization, Gavin Rublewski, Gavin Rublewski May 2023

Uconn Baseball Batting Order Optimization, Gavin Rublewski, Gavin Rublewski

Honors Scholar Theses

Challenging conventional wisdom is at the very core of baseball analytics. Using data and statistical analysis, the sets of rules by which coaches make decisions can be justified, or possibly refuted. One of those sets of rules relates to the construction of a batting order. Through data collection, data adjustment, the construction of a baseball simulator, and the use of a Monte Carlo Simulation, I have assessed thousands of possible batting orders to determine the roster-specific strategies that lead to optimal run production for the 2023 UConn baseball team. This paper details a repeatable process in which basic player statistics …


Predicting High-Cap Tech Stock Polarity: A Combined Approach Using Support Vector Machines And Bidirectional Encoders From Transformers, Ian L. Grisham May 2023

Predicting High-Cap Tech Stock Polarity: A Combined Approach Using Support Vector Machines And Bidirectional Encoders From Transformers, Ian L. Grisham

Electronic Theses and Dissertations

The abundance, accessibility, and scale of data have engendered an era where machine learning can quickly and accurately solve complex problems, identify complicated patterns, and uncover intricate trends. One research area where many have applied these techniques is the stock market. Yet, financial domains are influenced by many factors and are notoriously difficult to predict due to their volatile and multivariate behavior. However, the literature indicates that public sentiment data may exhibit significant predictive qualities and improve a model’s ability to predict intricate trends. In this study, momentum SVM classification accuracy was compared between datasets that did and did not …


An Integer Garch Model For A Poisson Process With Time-Varying Zero-Inflation, Isuru Panduka Ratnayake, V. A. Samaranayake May 2023

An Integer Garch Model For A Poisson Process With Time-Varying Zero-Inflation, Isuru Panduka Ratnayake, V. A. Samaranayake

Mathematics and Statistics Faculty Research & Creative Works

A serially dependent Poisson process with time-varying zero-inflation is proposed. Such formulations have the potential to model count data time series arising from phenomena such as infectious diseases that ebb and flow over time. The model assumes that the intensity of the Poisson process evolves according to a generalized autoregressive conditional heteroscedastic (GARCH) formulation and allows the zero-inflation parameter to vary over time and be governed by a deterministic function or by an exogenous variable. Both the expectation maximization (EM) and the maximum likelihood estimation (MLE) approaches are presented as possible estimation methods. A simulation study shows that both parameter …


A Generalized Family Of Exponentiated Composite Distributions With Applications To Insurance And Survival Data, Bowen Liu May 2023

A Generalized Family Of Exponentiated Composite Distributions With Applications To Insurance And Survival Data, Bowen Liu

UNLV Theses, Dissertations, Professional Papers, and Capstones

The concept of composite distributions was proposed in the early 2000s as a good parametric solution to model the data with heavy tails. Since the concept was proposed, it has been widely used in different areas, such as modeling insurance claim size data, predicting the risk measures in insurance data analysis, fitting survival time data, and modeling precipitation data. While a lot of the composite distributions demonstrated great performances in real applications, many commonly used composite distributions such as the inverse gamma-Pareto (IGP) or exponential-Pareto (EP), did not demonstrate great performances when fitting to several particular data sets. In order …


Multidimensional Investigation Of Tennessee’S Urban Forest, Jillian L. Gorrell May 2023

Multidimensional Investigation Of Tennessee’S Urban Forest, Jillian L. Gorrell

Doctoral Dissertations

Preserving existing trees in urban areas and properly cultivating urban forest conservation and management opportunities is valuable to the ever-growing urban environment and necessary for creating optimal experiences and educational tools to meet the needs of increasing urban populations. This dissertation contains studies investigating several facets of the urban forest, including environmental effects of deforestation and urbanization, tree equity, and urban forest facility management and accessibility. Community education and outreach at arboreta about the importance of the tree canopy can help promote environmental stewardship. A digital questionnaire was electronically distributed to representatives of arboreta certified through the Tennessee Division of …


A Machine Learning Approach To Obese-Inflammatory Phenotyping, Tania Mayleth Vargas May 2023

A Machine Learning Approach To Obese-Inflammatory Phenotyping, Tania Mayleth Vargas

Theses and Dissertations

Obesity is the accumulation of an abnormal, or excessive, amount of fat in the body, which can have negative effects on overall health. This excess accumulation of macronutrients in adipose tissue can cause the release of inflammatory mediators, leading to a proinflammatory state. Inflammation is a known risk factor for various health conditions, including cardiovascular diseases, metabolic syndrome, and diabetes. This study sought to examine the use of data mining methods, particularly clustering algorithms, to identify inflammatory biomarker phenotypes and their association with obesity in a local adolescent population. The algorithms evaluated in this study included: k-means, Ward's hierarchical …


A Machine Learning Approach To Evaluate The Effect Of Sodium-Glucose Cotransporter-2 Inhibitors On Chronic Kidney Disease In Diabetes Patients, Solomon Eshun May 2023

A Machine Learning Approach To Evaluate The Effect Of Sodium-Glucose Cotransporter-2 Inhibitors On Chronic Kidney Disease In Diabetes Patients, Solomon Eshun

Theses and Dissertations

Chronic kidney disease (CKD) is a significant complication that contributes to diabetes-related mortality in the United States, and there is growing evidence that sodium-glucose cotransporter 2 inhibitors (SGLT2i) can slow its progression. However, observational studies may suffer from confounding by indication, where patient characteristics and disease severity influence the decision to prescribe SGLT2i. This study utilized electronic health records of individuals with diabetes (from TriNetX) to investigate the effectiveness of SGLT2i on CKD progression. The database provided detailed information on patients’ CKD status, demographics, diagnosis, procedures, and medications, along with corresponding dates of diagnosis and prescription. The study comprised of …


Small But Mighty: Examing The Utility Of Microstatistics In Modeling Ice Hockey, Matt Palmer May 2023

Small But Mighty: Examing The Utility Of Microstatistics In Modeling Ice Hockey, Matt Palmer

Senior Honors Theses

As research into hockey analytics continues, an increasing number of metrics are being introduced into the knowledge base of the field, creating a need to determine whether various stats are useful or simply add noise to the discussion. This paper examines microstatistics – manually tracked metrics which go beyond the NHL’s publicly released stats – both through the lens of meta-analytics (which attempt to objectively assess how useful a metric is) and modeling game probabilities. Results show that while there is certainly room for improvement in understanding and use of microstats in modeling, the metrics overall represent an area of …


A Monte Carlo Analysis Of Nonprobability Sampling & Post Hoc Corrections, Julia Hong May 2023

A Monte Carlo Analysis Of Nonprobability Sampling & Post Hoc Corrections, Julia Hong

Masters Theses & Specialist Projects

Nonprobability samples are often used in place of probability samples because the former are less trouble and less expensive. Unfortunately, it is difficult to determine how well a sample represents population parameters when using nonprobability samples. Researchers attempt to mitigate the disadvantages of nonprobability sampling by performing post hoc corrections, but this adjustment may not successfully undo the effects of nonprobability sampling. To examine these effects, a Monte Carlo simulation was conducted to create a pseudo-population from which samples were drawn. Forty-one conditions were replicated 10,000 times each, with each sample consisting of 100 observations. A post-stratification adjustment was made …


Dynamics Of Inertial And Non-Inertial Particles In Geophysical Flows, Nishanta Baral May 2023

Dynamics Of Inertial And Non-Inertial Particles In Geophysical Flows, Nishanta Baral

Theses, Dissertations and Culminating Projects

We consider the dynamics of inertial and non-inertial particles in various flows. We investigate the underlying structures of the flow field by examining their Lagrangian coherent structures (LCS), which are found by computing finitetime Lyapunov exponents (FTLE). We compare the behavior of massless noninertial particles using the velocity fields from four models, the Duffing oscillator, the Bickley jet, the double-gyre flow, and a quasi-geostrophic geophysical flow model, with that of inertial particles. For inertial particles with finite size and mass, we use the Maxey-Riley equation to describe the particle’s motion. We explore the preferential aggregation of inertial particles and demonstrate …


Parameter Optimization For Excitable Cell Models, Amrit Parmar May 2023

Parameter Optimization For Excitable Cell Models, Amrit Parmar

Theses, Dissertations and Culminating Projects

The electrophysiology of nodose ganglia neurons is of great interest in the analysis of cell membrane currents and action potential behavior. This behavior was initially outlined in the Hodgkin-Huxley conductance model [1] using a system of nonlinear differential equations. Later, Schild et al. [2] developed an extension of the Hodgkin-Huxley model to provide a more exhaustive description of ion channels involved in nodose neuronal action potential activity. We consider a variety of methods to fit the parameters of both the Hodgkin-Huxley and Schild et al. models to an empirical stimulus response dataset. Our methods were validated using synthetic datasets, as …


Jackknife Empirical Likelihood Tests For Equality Of Generalized Lorenz Curves, Anton Butenko May 2023

Jackknife Empirical Likelihood Tests For Equality Of Generalized Lorenz Curves, Anton Butenko

Electronic Theses, Projects, and Dissertations

A Lorenz curve is a graphical representation of the distribution of income or wealth within a population. The generalized Lorenz curve can be created by scaling the values on the vertical axis of a Lorenz curve by the average output of the distribution. In this thesis, we propose two nonparametric methods for testing the equality of two generalized Lorenz curves. Both methods are based on empirical likelihood and utilize a U -statistic. We derive the limiting distribution of the likelihood ratio, which is shown to follow a chi-squared distribution with one degree of freedom. We conduct simulations to compare the …


Examining Political Discourse On Online 8kun And Reddit Forums, Braden Mindrum May 2023

Examining Political Discourse On Online 8kun And Reddit Forums, Braden Mindrum

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

A recent example of political violence in the United States was that of the January 6, 2021, Capitol attack in connection with the certification of Joseph R. Biden’s victory over Donald J. Trump in the 2020 US presidential election. This thesis analyzes the events of January 6, 2021, through the lens of social media discourse. This thesis presents a workflow that acquired over 5 million 8kun and Reddit posts from various apolitical and political forums in the three months preceding and following the Capitol attack on January 6, 2021. Techniques from text analysis are then used to group forums according …


Effects Of Functional Network Model Definition On Biomarker Outcome Prediction, Xinyang Feng May 2023

Effects Of Functional Network Model Definition On Biomarker Outcome Prediction, Xinyang Feng

Arts & Sciences Electronic Theses and Dissertations

Machine learning (ML) models are widely used to investigate the human connectome and to predict and understand behavior, emotion, and cognition. Prior research has organized pediatric connectome data using adult functional network models. However, this assumes that adult functional network models are appropriate and useful for prediction developmental outcomes from pediatric connectome data. We hypothesize that the application of adult brain network models could result in poor model fit, limiting the generalizability of results. Here, we test whether prediction of biological age is improved by concordant brain network models matching underlying functional connectome data. To quantify the difference in age …


Large Deviations For Self Intersection Local Times Of Ornstein-Uhlenbeck Processes, Apostolos Gournaris May 2023

Large Deviations For Self Intersection Local Times Of Ornstein-Uhlenbeck Processes, Apostolos Gournaris

Doctoral Dissertations

In the area of large deviations, people concern about the asymptotic computation of small probabilities on an exponential scale. The general form of large deviations can be roughly described as: P{Yn ∈ A} ≈ exp{−bnI(A)} (n → ∞), for a random sequence {Yn}, a positive sequence bn with bn → ∞, and a coefficient I(A) ≥ 0. In applications, we often concern about the probability that the random variables take large values, that is we concern about the P{Yn ≥ λ}, where λ > 0. Here, we consider the Ornstein-Uhlenbeck process, study the properties of the local times and self intersection …