Open Access. Powered by Scholars. Published by Universities.®

Applied Statistics Commons

Open Access. Powered by Scholars. Published by Universities.®

PDF

Theses/Dissertations

Discipline
Institution
Keyword
Publication Year
Publication

Articles 1 - 30 of 751

Full-Text Articles in Applied Statistics

Multi-Label Classification Using Conformal Prediction, Chhavi Tyagi Aug 2024

Multi-Label Classification Using Conformal Prediction, Chhavi Tyagi

Dissertations

In many machine learning applications, such as image tagging, document classi-fication, and medical diagnosis, a data instance can be associated with multiple classes in parallel so that each instance is associated with multiple response variables simultaneously defining multi-label classification. Standard multi-label classification methods that provide point predictions have been developed. They lack in quantifying the uncertainty of predictions. These methods also lack in accounting for label dependencies and are very computationally expensive. This dissertation develops two methods of multi-label classification using conformal prediction that quantify the uncertainty of predictions. Chapter 1 introduces notations and tools that have been used in …


The Impact Of “Multiple Looks” When Performing Survival Analysis, Quentin Eloise Aug 2024

The Impact Of “Multiple Looks” When Performing Survival Analysis, Quentin Eloise

Electronic Theses and Dissertations

Survival analysis is a critical statistical method in healthcare to assess patient treatment effects and disease progression. Another critical area of statistical methodology in health care is the practice of adaptive designs. Adaptive designs allow for interim analyses to take place during a study and various decisions and actions can take place more ethically. This is beneficial for studies that take multiple years to complete and allows administrators and healthcare providers to make sound decisions as early as possible. A challenging aspect of adaptive designs is that the number of interim analyses is known in advance which is applicable in …


Bayesian Variational Inference In Keyword Identification And Multiple Instance Classification, Yaofang Hu Aug 2024

Bayesian Variational Inference In Keyword Identification And Multiple Instance Classification, Yaofang Hu

Statistical Science Theses and Dissertations

This dissertation investigates (1) Variational Bayesian Semi-supervised Keyword Extraction and (2) Variational Bayesian Multimodal Multiple Instance Classification.

The expansion of textual data, stemming from various sources such as online product reviews and scholarly publications on scientific discoveries, has created a demand for the extraction of succinct yet comprehensive information. As a result, in recent years, efforts have been spent in developing novel methodologies for keyword extraction. Although many methods have been proposed to automatically extract keywords in the contexts of both unsupervised and fully supervised learning, how to effectively use partially observed keywords, such as author-specified keywords, remains an under-explored …


Bayesian And Deep Generative Modeling In Immunology, Yuqiu Yang Aug 2024

Bayesian And Deep Generative Modeling In Immunology, Yuqiu Yang

Statistical Science Theses and Dissertations

Due to the accumulation of a large volume of data of different natures such as sequencing data, proteomics data, and clinical data, statistical methods and deep learning algorithms have become increasingly important in the field of immunology. By leveraging the diverse datasets as well as interdisciplinary knowledge from areas like biology and public health, these quantitative methods have revolutionized this field by providing powerful tools for data analysis, modeling, and prediction. This has led to a deeper understanding of the immune system, accelerated the development of novel therapies, and paved the way for personalized and precision medicine approaches in immunology. …


Exploring Healthcare Chatbot Information Presentation: Applying Hierarchical Bayesian Regression And Inductive Thematic Analysis In A Mixed Methods Study, Samuel Nelson Koscelny Aug 2024

Exploring Healthcare Chatbot Information Presentation: Applying Hierarchical Bayesian Regression And Inductive Thematic Analysis In A Mixed Methods Study, Samuel Nelson Koscelny

All Theses

High blood pressure, also known as hypertension, significantly increases the risk of heart disease and stroke, which are leading causes of death in the United States. While contributing to over 691,000 deaths in 2021 alone in the United States (U.S.), it also imposes immense economic burden on the healthcare system, costing approximately $131 billion annually. One way to address this issue is for increased self-care behaviors and medication adherence, both of which require sufficient health literacy. Despite the importance of health literacy, 90% of U.S. adults struggle with health-related subjects. Overcoming the issues associated with health literacy requires addressing the …


Simulation Study On Confidence Interval Estimation For Standard Deviation With Non-Normal Distributions, Theophilus Oppong Kyeremeh Aug 2024

Simulation Study On Confidence Interval Estimation For Standard Deviation With Non-Normal Distributions, Theophilus Oppong Kyeremeh

Electronic Theses and Dissertations

This study explores innovative approaches to constructing confidence intervals for the population standard deviation, σ, in non-normal data scenarios. While the sample standard deviation, s, is widely used, its reliability is compromised when dealing with skewed or heavy-tailed distributions and exhibits sensitivity to outliers. Our research addresses these limitations by investigating alternative estimation methods that offer greater robustness and accuracy.


High Fat Diet & Social Isolation: Interactive Effects On Pain, Cognition, & Neuroinflammation, Ian M. Campuzano Aug 2024

High Fat Diet & Social Isolation: Interactive Effects On Pain, Cognition, & Neuroinflammation, Ian M. Campuzano

Research Psychology Theses

Prior research has established a role for both social isolation and exposure to high fat Western diets in altering a range of behaviors from reduced memory performance to increased depression-like behaviors. The present study scrutinizes the interplay among these variables during the peri-adolescent developmental phase, utilizing Long-Evans rats as the experimental model. Our overarching hypothesis is that rats exposed to either social isolation, a high-fat diet, or both will result in heightened pain sensitivity, diminished cognitive flexibility, and increased neuroinflammatory responses within brain regions implicated in sociability, cognition, memory, and pain processing. Behavioral flexibility will be assessed using a maze-based …


A Uniformly Most Powerful Test For The Mean Of A Beta Distribution, Richard Ntiamoah Kyei Aug 2024

A Uniformly Most Powerful Test For The Mean Of A Beta Distribution, Richard Ntiamoah Kyei

Electronic Theses and Dissertations

The beta distribution is used in numerous real-world applications, including areas such as manufacturing (quality control) and analyzing patient outcomes in health care. It also plays a key role in statistical theory, including multivariate analysis of variance (MANOVA) and Bayesian statistics. It is a flexible distribution that can account for many different characteristics of real data. To our surprise, there has been very little work or discussion on performing statistical hypothesis testing for the mean when it is reasonable to assume that the population is beta distributed. Many analysts conduct traditional analyses using a t-test or nonparametric approach, try transformations, …


Visualization Of Species Tree Likelihood Under The Multispecies Coalescent Model, Jaimasan Sutton Jul 2024

Visualization Of Species Tree Likelihood Under The Multispecies Coalescent Model, Jaimasan Sutton

Mathematics & Statistics ETDs

A commonly used tool for evolutionary biologists is a phylogenetic tree that represents the ancestry of a set of species and the evolution of traits. Statistical models can be used to predict the probabilities of gene trees which represent ancestral relationships of genes sampled from species. Because of this, we are able to represent the likelihood of a species tree, which represents the evolutionary history of a set of species, as a function of the counts of gene tree topologies, where each gene tree represents the ancestry of a specific genetic locus for multiple species. Because we can represent these …


Evaluating Past Progress And Assessing Prediction Breeding Strategies For Sustained Genetic Gains In The Louisiana Sugarcane Variety Development Program, Brayden A. Blanchard Jun 2024

Evaluating Past Progress And Assessing Prediction Breeding Strategies For Sustained Genetic Gains In The Louisiana Sugarcane Variety Development Program, Brayden A. Blanchard

LSU Doctoral Dissertations

The aim of this dissertation is to outline important considerations for the Louisiana Sugarcane Variety Development Program (LSVDP) as it pertains to historical progress, impact, goal setting, and new strategies for continued genetic gains. Industry progress was evaluated with robust regression models to quantify rates of productivity gains. Over the last 50 years, statistically significant productivity gains were identified in sucrose content (45%), cane yield (32.2%), and sugar yield (93%) while pairwise comparisons of decades showed that progress was incremental rather than rapid and sustained once achieved. The decade from 1990-1999 was identified as the only decade with a significant …


Examining The Interaction Between Calcium Supplement Use, Demographics, And Lifestyle Factors On Bone Health In Women, Vix Talbot Jun 2024

Examining The Interaction Between Calcium Supplement Use, Demographics, And Lifestyle Factors On Bone Health In Women, Vix Talbot

University Honors Theses

Osteoporosis is a condition which poses a significant health threat, particularly among women during the menopause transition, where accelerated bone loss increases fracture risk. Calcium supplementation has been shown to be an important intervention to mitigate bone mineral density (BMD) decline during this and other periods of life. However, the efficacy of calcium supplementation is influenced by various individual factors, including demographics and lifestyle habits. This study investigates the interaction between calcium supplement use, and several interaction terms on bone health in women. Multiple linear regression analysis is employed to assess the impact of these factors on BMD. Data from …


Capturing Latent Abilities And Latent Capacities Of Professional Golfers Using Nonlinear Mixed Effects Growth Modeling, Mac Wetherbee Jun 2024

Capturing Latent Abilities And Latent Capacities Of Professional Golfers Using Nonlinear Mixed Effects Growth Modeling, Mac Wetherbee

Electronic Theses and Dissertations

This study demonstrates an effective and innovative approach to measuring the latent athletic abilities and capacities of professional golfers. I used nonlinear mixed effects growth modeling (e.g., Dynamic Measurement Modeling) to measure professional golfers’ ability levels and capacities for improvement. I accomplished this using a two-stage modeling approach. First, a crossed linear mixed effects model estimated each player’s ability level in each year. In the second stage, I used the results from the first stage to estimate several candidate nonlinear growth trajectories for players’ abilities over time. The quadratic growth trajectory was the best-fitting of these trajectories and was used …


The Application Of Elastic Distance In Astrophysical Time Series, Xiyang Zhang Jun 2024

The Application Of Elastic Distance In Astrophysical Time Series, Xiyang Zhang

Electronic Thesis and Dissertation Repository

Elastic distances, e.g. dynamic time warping (DTW), evaluate the similarity between query and reference sequences by dynamic programming. The 1-Nearest-Neighbor predictor with DTW is one benchmark in time series classification. However, DTW is less efficient in astronomical time series because of ignorance of the information in time stamps and its dependence on the shape and magnitude between query and reference sequences. We apply two elastic distances which integrate the information in the time domain, time warp editing distance (TWED) and Skorohod distance, which is calculated by using Fre ́chet distance, to three astronomical datasets to compare with DTW and Euclidean …


Qwixx Strategies Using Simulation And Mcmc Methods, Joshua W. Blank Jun 2024

Qwixx Strategies Using Simulation And Mcmc Methods, Joshua W. Blank

Master's Theses

This study explores optimal strategies for maximizing scores and winning in the popular dice game Qwixx, analyzing both single and multiplayer gameplay scenarios. Through extensive simulations, various strategies were tested and compared, including a scorebased approach that uses a formula tuned by MCMC random walks, and race-to-lock approaches which use absorbing Markov chain qualities of individual score sheet rows to find ways to lock rows as quickly as possible. Results indicate that employing a scorebased strategy, considering gap, count, position, skip, and likelihood scores, significantly improves performance in single player games, while move restrictions based on specific dice roll sums …


Recursive Marix Game Analysis: Optimal, Simplified, And Human Strategies In Brave Rats, William A. Medwid Jun 2024

Recursive Marix Game Analysis: Optimal, Simplified, And Human Strategies In Brave Rats, William A. Medwid

Master's Theses

Brave Rats is a short game with simple rules, yet establishing a comprehensive strategy is very challenging without extensive computation. After explaining the rules, this paper begins by calculating the optimal strategy by recursively solving each turn’s Minimax strategy. It then provides summary statistics about the complex, branching Minimax solution. Next, we examine six other strategy models and evaluate their performance against each other. These models’ flaws highlight the key elements that contribute to the effectiveness of the Minimax strategy and offer insight into simpler strategies that human players could mimic. Finally, we analyze 123 games of human data collected …


Using Plankton Edna To Estimate Whale Abundances Off The California Coast: Data Integration And Statistical Modeling, Katherine Chan Jun 2024

Using Plankton Edna To Estimate Whale Abundances Off The California Coast: Data Integration And Statistical Modeling, Katherine Chan

Master's Theses

Understanding marine mammal populations and how they are affected by human activity and ocean conditions is vital, especially in tracking population declines and monitoring endangered species. However, tracking marine mammal populations and their distribution is challenging due to difficulties in observation and costs. Using surrounding plankton environmental DNA (eDNA) has the potential to provide an indirect measure of monitoring cetacean abundances based on ecological associations. This project aims to apply statistical methods to assess the relationship of visual abundances of common species of baleen whales with amplicon sequence variants (ASV) of plankton eDNA samples from the NOAA-CalCOFI Ocean Genomics (NCOG) …


Causal Inference Using Bayesian Network For Search And Rescue, Amanda Belden Jun 2024

Causal Inference Using Bayesian Network For Search And Rescue, Amanda Belden

Master's Theses

People who are considered missing have much higher probabilities of being found dead compared to those who are not considered missing in terms of Search and Rescue (SAR) missions. Dementia patients are incredibly likely to be declared missing, and in fact after removing those with dementia the probability of the mission being regarded as missing person case is only about 10%. Additionally, those who go missing are much more likely to be on private land than on protected areas such as forests and parks. These and similar associations can be represented and investigated using a Bayesian network that has been …


Unraveling The History Of Deforestation In The Amazon Rainforest With Statistical Modeling, Ryan Destefano Jun 2024

Unraveling The History Of Deforestation In The Amazon Rainforest With Statistical Modeling, Ryan Destefano

Master's Theses

The Amazon rainforest, a vital ecosystem of immense biodiversity and global climate significance, faces the ongoing threat of deforestation driven by agricultural expansion. This thesis employs remote sensing techniques, focusing on the Enhanced Vegetation Index (EVI) derived from Landsat satellite imagery, to track land cover dynamics within the Amazon. The study examines historical land cover changes in current plantations in Peru and Brazil, regions where the exact timing of deforestation is uncertain. By analyzing EVI measurements dating back to 1984, inflection points indicative of deforestation events preceding plantation establishment are identified. Statistical modeling techniques, including spline fitting to analyze time …


Automatic Appraisals Of Houses, Robert Sloan Scroggin May 2024

Automatic Appraisals Of Houses, Robert Sloan Scroggin

Graduate Theses and Dissertations

Multiple hedonic models and an automatic appraiser model were used to create a residential house’s estimated sales price. The goal is to use the limited data available to a REALTOR® to estimate the future sales price of a residential home without the aid of pictures of the property or viewing the physical property. The first model automates some of the actions of an appraiser by finding comparable sales based on proximity, based both on distance between houses and characteristics of the houses, and then calculating a weighted average price for an estimated sales price of future sales. If the model …


An Analysis Of Lyrical Repetition And Popularity In Popular Music Genres, Josh White May 2024

An Analysis Of Lyrical Repetition And Popularity In Popular Music Genres, Josh White

Undergraduate Honors Capstone Projects

This paper examines the correlation between repetitiveness and popularity in the genres of Christian, Country, EDM, Hip-Hop, Latin, Pop, R&B, and Rock. Repetitiveness is defined by the frequency of repeated words in lyrics, and the average number of streams per day defines popularity. This analysis also acknowledges the "popularity" metric provided by Spotify in calculating the correlation. To calculate this correlation, I wrote a program that accesses the Spotify and Genius APIs to gather metadata related to 76,069 songs from 1,246 artists, including data on repetitiveness, tempo, duration, and Spotify's audio metrics of "danceability," "energy," "speechiness," "acousticness," and "instrumentalness." I …


Stability Of Quantum Computers, Samudra Dasgupta May 2024

Stability Of Quantum Computers, Samudra Dasgupta

Doctoral Dissertations

Quantum computing's potential is immense, promising super-polynomial reductions in execution time, energy use, and memory requirements compared to classical computers. This technology has the power to revolutionize scientific applications such as simulating many-body quantum systems for molecular structure understanding, factorization of large integers, enhance machine learning, and in the process, disrupt industries like telecommunications, material science, pharmaceuticals and artificial intelligence. However, quantum computing's potential is curtailed by noise, further complicated by non-stationary noise parameter distributions across time and qubits. This dissertation focuses on the persistent issue of noise in quantum computing, particularly non-stationarity of noise parameters in transmon processors. It …


A Novel Correction For The Multivariate Ljung-Box Test, Minhao Huang May 2024

A Novel Correction For The Multivariate Ljung-Box Test, Minhao Huang

Computational and Data Sciences (PhD) Dissertations

This research introduces an analytical improvement to the Multivariate Ljung-Box test that addresses significant deviations of the original test from the nominal Type I error rates under almost all scenarios. Prior attempts to mitigate this issue have been directed at modification of the test statistics or correction of the test distribution to achieve precise results in finite samples. In previous studies, focused on designing corrections to the univariate Ljung-Box, a method that specifically adjusts the test rejection region has been the most successful of attaining the best Type I error rates. We adopt the same approach for the more complex, …


Cost-Risk Analysis Of The Ercot Region Using Modern Portfolio Theory, Megan Sickinger May 2024

Cost-Risk Analysis Of The Ercot Region Using Modern Portfolio Theory, Megan Sickinger

Master's Theses

In this work, we study the use of modern portfolio theory in a cost-risk analysis of the Electric Reliability Council of Texas (ERCOT). Based upon the risk-return concepts of modern portfolio theory, we develop an n-asset minimization problem to create a risk-cost frontier of portfolios of technologies within the ERCOT electricity region. The levelized cost of electricity for each technology in the region is a step in evaluating the expected cost of the portfolio, and the historical data of cost factors estimate the variance of cost for each technology. In addition, there are several constraints in our minimization problem to …


Representation Learning For Generative Models With Applications To Healthcare, Astronautics, And Aviation, Van Minh Nguyen May 2024

Representation Learning For Generative Models With Applications To Healthcare, Astronautics, And Aviation, Van Minh Nguyen

Theses and Dissertations

This dissertation explores applications of representation learning and generative models to challenges in healthcare, astronautics, and aviation.

The first part investigates the use of Generative Adversarial Networks (GANs) to synthesize realistic electronic health record (EHR) data. An initial attempt at training a GAN on the MIMIC-IV dataset encountered stability and convergence issues, motivating a deeper study of 1-Lipschitz regularization techniques for Auxiliary Classifier GANs (AC-GANs). An extensive ablation study on the CIFAR-10 dataset found that Spectral Normalization is key for AC-GAN stability and performance, while Weight Clipping fails to converge without Spectral Normalization. Analysis of the training dynamics provided further …


Efficient Fully Bayesian Approaches To Brain Activity Mapping With Complex-Valued Fmri Data: Analysis Of Real And Imaginary Components In A Cartesian Model And Extension To Magnitude And Phase In A Polar Model, Zhengxin Wang May 2024

Efficient Fully Bayesian Approaches To Brain Activity Mapping With Complex-Valued Fmri Data: Analysis Of Real And Imaginary Components In A Cartesian Model And Extension To Magnitude And Phase In A Polar Model, Zhengxin Wang

All Dissertations

Functional magnetic resonance imaging (fMRI) plays a crucial role in neuroimaging, enabling the exploration of brain activity through complex-valued signals. Traditional fMRI analyses have largely focused on magnitude information, often overlooking the potential insights offered by phase data, and therefore, lead to underutilization of available data and flawed statistical assumptions. This dissertation proposes two efficient, fully Bayesian approaches for the analysis of complex-valued functional magnetic resonance imaging (cv-fMRI) time series.

Chapter 2 introduces the model, referred to as CV-sSGLMM, using the real and imaginary components of cv-fMRI data and sparse spatial generalized linear mixed model prior. This model extends the …


Evaluation Of Regression Methods And Competition Indices In Characterizing Height-Diameter Relationships For Temperate And Pantropical Tree Species, Sakar Jha May 2024

Evaluation Of Regression Methods And Competition Indices In Characterizing Height-Diameter Relationships For Temperate And Pantropical Tree Species, Sakar Jha

Masters Theses

Height-diameter relationship models, denoted as H-D models, have important applications in sustainable forest management which include studying the vertical structure of a forest stand, understanding the habitat heterogeneity for wildlife niches, analyzing the growth rate pattern for making decisions regarding silvicultural treatments. Compared to monocultures, characterizing allometric relationships for uneven-aged, mixed-species forests, especially tropical forests, is more challenging and has historically received less attention. Modelling how the competitive interactions between trees of varying sizes and multiple species affects these relationships adds a high degree of complexity. In this study, five regression methods and five distance-independent competition indices were evaluated for …


A Causal Inference Approach For Spike Train Interactions, Zach Saccomano Feb 2024

A Causal Inference Approach For Spike Train Interactions, Zach Saccomano

Dissertations, Theses, and Capstone Projects

Since the 1960s, neuroscientists have worked on the problem of estimating synaptic properties, such as connectivity and strength, from simultaneously recorded spike trains. Recent years have seen renewed interest in the problem coinciding with rapid advances in experimental technologies, including an approximate exponential increase in the number of neurons that can be recorded in parallel and perturbation techniques such as optogenetics that can be used to calibrate and validate causal hypotheses about functional connectivity. This thesis presents a mathematical examination of synaptic inference from two perspectives: (1) using in vivo data and biophysical models, we ask in what cases the …


Statistical Consulting In Academia: A Review, Ke Xiao Jan 2024

Statistical Consulting In Academia: A Review, Ke Xiao

Major Papers

This paper reviews the state of statistical consulting in academia by performing a literature review on this topic in chapters 1 and 2. Chapter 1 overviews general aspects of statistical consulting and types of centers that conduct such services in academia. In Chapter 2 we summarise the literature about the common logistics and processes for conducting statistical consulting in academia. In Chapters 3 and 4, we analyze data on statistical consulting centers for the largest 100 universities in the USA. We also review the literature on the future of statistical consulting in academia in the era of big data and …


Judging Our New Judges: Why We Must Remove Artificial Intelligence From Our Courtrooms Now, Kieran Duffy Newcomb Jan 2024

Judging Our New Judges: Why We Must Remove Artificial Intelligence From Our Courtrooms Now, Kieran Duffy Newcomb

Honors Theses and Capstones

In this paper, I explore some of the ways in which artificial intelligence might enhance the sentencing process through recidivism prediction technology. Notably, this technology can increase the accuracy of risk predictions and the speed with which sentencing decisions are reached. I then show, however, that the recidivism prediction technology is likely to turn into what data scientist Cathy O’Neil calls a Weapon of Math Destruction. The potential harmfulness of this technology is due not to the inherent nature of the technology, but the symbiotic relationship it will have with our already harmful criminal justice system. I argue that the …


Accounting For Variability Due To Resampling Using Bootstrapping, Dipendra Phuyal Jan 2024

Accounting For Variability Due To Resampling Using Bootstrapping, Dipendra Phuyal

Electronic Theses and Dissertations

Bradley Efron (1979) introduced bootrapping. Typically a researcher is interested in studying a process which generates individuals. The collection of individuals the process has(actual) or could have (conceptual) generated is the population. The collection of conceptual members of the population is an uncountable collection. Hence, the population is anuncountable collection of individuals. The collection of individuals the process has generated (actual individuals) is representative of what the process can generate and will bereferred to as the representative sample. The size of this sample is a nonnegative integervalued random variable N which may be a constant random variable such as in …