Open Access. Powered by Scholars. Published by Universities.®

Applied Statistics Commons

Open Access. Powered by Scholars. Published by Universities.®

Series

Discipline
Institution
Keyword
Publication Year
Publication
File Type

Articles 1 - 30 of 636

Full-Text Articles in Applied Statistics

Formulating An Efficient Statistical Test Using The Goodness Of Fit Approach With Applications To Real-Life Data, S. A. Qaid, S. E. Abo Youssef Prof., Mahmoud Mansour Jan 2024

Formulating An Efficient Statistical Test Using The Goodness Of Fit Approach With Applications To Real-Life Data, S. A. Qaid, S. E. Abo Youssef Prof., Mahmoud Mansour

Basic Science Engineering

Statistical tests are very important for researchers to make decisions. In particular, when the tests are non-parametric, they are of greater importance because they can be applied to a wide range of data sets regardless of knowing the distribution of these data. Researchers are therefore racing to obtain efficient tests for making good decisions based on the results of these tests. In this study, NBU (2)L was used based on the goodness of fit approach to present an efficient statistical test. The efficiency of the proposed test was computed, and the results were compared to those of other tests. Critical …


Machine Learning Approaches For Cyberbullying Detection, Roland Fiagbe Jan 2024

Machine Learning Approaches For Cyberbullying Detection, Roland Fiagbe

Data Science and Data Mining

Cyberbullying refers to the act of bullying using electronic means and the internet. In recent years, this act has been identifed to be a major problem among young people and even adults. It can negatively impact one’s emotions and lead to adverse outcomes like depression, anxiety, harassment, and suicide, among others. This has led to the need to employ machine learning techniques to automatically detect cyberbullying and prevent them on various social media platforms. In this study, we want to analyze the combination of some Natural Language Processing (NLP) algorithms (such as Bag-of-Words and TFIDF) with some popular machine learning …


Predicting Superconducting Critical Temperature Using Regression Analysis, Roland Fiagbe Jan 2024

Predicting Superconducting Critical Temperature Using Regression Analysis, Roland Fiagbe

Data Science and Data Mining

This project estimates a regression model to predict the superconducting critical temperature based on variables extracted from the superconductor’s chemical formula. The regression model along with the stepwise variable selection gives a reasonable and good predictive model with a lower prediction error (MSE). Variables extracted based on atomic radius, valence, atomic mass and thermal conductivity appeared to have the most contribution to the predictive model.


Is The Declining Birthrate Really An Issue For The Economy?, Harsh Ramesh Pednekar, Theodore Lee, Darrion Chin Dec 2023

Is The Declining Birthrate Really An Issue For The Economy?, Harsh Ramesh Pednekar, Theodore Lee, Darrion Chin

Introduction to Research Methods RSCH 202

This study aims to explore the complex implications of declining birth rates on the economy, focusing on GDP per capita as a crucial metric, and aims to uncover both potential opportunities and challenges stemming from this demographic transformation using regression analysis. Using a quantitative methodology and secondary data from OECD.stat, World Population Review, and World Bank, the study explores the relationship between declining birth rates and economic impacts. GDP per capita serves as an essential dependent variable, and it accounts for control variables such as labour force participation, literacy, and education levels, child dependence ratio, and physical capital. Past studies …


The Impact Of Neighborhood Socioeconomic Disadvantage On Operative Outcomes After Single-Level Lumbar Fusion, Grace Y. Ng, Ritesh Karsalia, Ryan S. Gallagher, Austin J. Borja, Jianbo Na, Scott Mcclintock, Neil R. Malhotra Dec 2023

The Impact Of Neighborhood Socioeconomic Disadvantage On Operative Outcomes After Single-Level Lumbar Fusion, Grace Y. Ng, Ritesh Karsalia, Ryan S. Gallagher, Austin J. Borja, Jianbo Na, Scott Mcclintock, Neil R. Malhotra

Mathematics Faculty Publications

INTRODUCTION: The relationship between socioeconomic status and neurosurgical outcomes has been investigated with respect to insurance status or median household income, but few studies have considered more comprehensive measures of socioeconomic status. This study examines the relationship between Area Deprivation Index (ADI), a comprehensive measure of neighborhood socioeconomic disadvantage, and short-term postoperative outcomes after lumbar fusion surgery. METHODS: 1861 adult patients undergoing single-level, posterior-only lumbar fusion at a single, multihospital academic medical center were retrospectively enrolled. An ADI matching protocol was used to identify each patient's 9-digit zip code and the zip code-associated ADI data. Primary outcomes included 30- and …


A Classical Fall Statistics Problem, Timothy L. Meyer Oct 2023

A Classical Fall Statistics Problem, Timothy L. Meyer

Cornhusker Economics

An evaluation of traditional baseball measures and suggestions for alternatives, centering on statistics related to the offensive quality of a player.


Multi-Representation Variational Autoencoder Via Iterative Latent Attention And Implicit Differentiation, Nhu Thuat Tran, Hady Wirawan Lauw Oct 2023

Multi-Representation Variational Autoencoder Via Iterative Latent Attention And Implicit Differentiation, Nhu Thuat Tran, Hady Wirawan Lauw

Research Collection School Of Computing and Information Systems

Variational Autoencoder (VAE) offers a non-linear probabilistic modeling of user's preferences. While it has achieved remarkable performance at collaborative filtering, it typically samples a single vector for representing user's preferences, which may be insufficient to capture the user's diverse interests. Existing solutions extend VAE to model multiple interests of users by resorting a variant of self-attentive method, i.e., employing prototypes to group items into clusters, each capturing one topic of user's interests. Despite showing improvements, the current design could be more effective since prototypes are randomly initialized and shared across users, resulting in uninformative and non-personalized clusters.To fill the gap, …


Exploring Experimental Design And Multivariate Analysis Techniques For Evaluating Community Structure Of Bacteria In Microbiome Data, Kelsey Karnik Aug 2023

Exploring Experimental Design And Multivariate Analysis Techniques For Evaluating Community Structure Of Bacteria In Microbiome Data, Kelsey Karnik

Department of Statistics: Dissertations, Theses, and Student Work

The gut microbiome plays a crucial role in human health, and by working collaboratively with microbiologists, we aim to further our understanding of the human gut and its impact on human health. Promoting a diverse microbiome is emphasized throughout microbiology literature, and involving a statistician in designing experiments to relate gut bacteria and some measured health outcome is crucial for ensuring valid and accurate results. By adopting new experimental design and analysis methods, researchers can begin to gain a deeper understanding of how the genetics of our food affect the composition of taxa within the gut microbiome. This dissertation is …


Sentiment Analysis Before And During The Covid-19 Pandemic, Emily Musgrove Jul 2023

Sentiment Analysis Before And During The Covid-19 Pandemic, Emily Musgrove

Mathematics Summer Fellows

This study examines the change in connotative language use before and during the Covid-19 pandemic. By analyzing news articles from several major US newspapers, we found that there is a statistically significant correlation between the sentiment of the text and the publication period. Specifically, we document a large, systematic, and statistically significant decline in the overall sentiment of articles published in major news outlets. While our results do not directly gauge the sentiment of the population, our findings have important implications regarding the social responsibility of journalists and media outlets especially in times of crisis.


On Colorings And Orientations Of Signed Graphs, Daniel Slilaty Jun 2023

On Colorings And Orientations Of Signed Graphs, Daniel Slilaty

Mathematics and Statistics Faculty Publications

A classical theorem independently due to Gallai and Roy states that a graph G has a proper k-coloring if and only if G has an orientation without coherent paths of length k. An analogue of this result for signed graphs is proved in this article.


Movie Recommender System Using Matrix Factorization, Roland Fiagbe May 2023

Movie Recommender System Using Matrix Factorization, Roland Fiagbe

Data Science and Data Mining

Recommendation systems are a popular and beneficial field that can help people make informed decisions automatically. This technique assists users in selecting relevant information from an overwhelming amount of available data. When it comes to movie recommendations, two common methods are collaborative filtering, which compares similarities between users, and content-based filtering, which takes a user’s specific preferences into account. However, our study focuses on the collaborative filtering approach, specifically matrix factorization. Various similarity metrics are used to identify user similarities for recommendation purposes. Our project aims to predict movie ratings for unwatched movies using the MovieLens rating dataset. We developed …


Formula 101 Using 2022 Formula One Season Data To Understand The Race Results, Christopher Garcia, Oliver Lopez May 2023

Formula 101 Using 2022 Formula One Season Data To Understand The Race Results, Christopher Garcia, Oliver Lopez

Student Scholar Symposium Abstracts and Posters

The reason why I am interested in Formula One is that my friend showed me what Formula One was all about. It became interesting to see the action of the sport, including the battles the drivers have during the race and how fast they go through a corner. Also, when qualifying comes around, they push their car to the absolute limit to gain a few seconds off their opponents. The drivers only in the top 10 receive points from the winner getting 25 points, the last driver in the top 10 getting 1 point, and those below the top ten …


Examining The Effect Of Word Embeddings And Preprocessing Methods On Fake News Detection, Jessica Hauschild May 2023

Examining The Effect Of Word Embeddings And Preprocessing Methods On Fake News Detection, Jessica Hauschild

Department of Statistics: Dissertations, Theses, and Student Work

The words people choose to use hold a lot of power, whether that be in spreading truth or deception. As listeners and readers, we do our best to understand how words are being used. There are many current methods in computer science literature attempting to embed words into numerical information for statistical analyses. Some of these embedding methods, such as Bag of Words, treat words as independent, while others, such as Word2Vec, attempt to gain information about the context of words. It is of interest to compare how well these various methods of translating text into numerical data work specifically …


Uconn Baseball Batting Order Optimization, Gavin Rublewski, Gavin Rublewski May 2023

Uconn Baseball Batting Order Optimization, Gavin Rublewski, Gavin Rublewski

Honors Scholar Theses

Challenging conventional wisdom is at the very core of baseball analytics. Using data and statistical analysis, the sets of rules by which coaches make decisions can be justified, or possibly refuted. One of those sets of rules relates to the construction of a batting order. Through data collection, data adjustment, the construction of a baseball simulator, and the use of a Monte Carlo Simulation, I have assessed thousands of possible batting orders to determine the roster-specific strategies that lead to optimal run production for the 2023 UConn baseball team. This paper details a repeatable process in which basic player statistics …


Small But Mighty: Examing The Utility Of Microstatistics In Modeling Ice Hockey, Matt Palmer May 2023

Small But Mighty: Examing The Utility Of Microstatistics In Modeling Ice Hockey, Matt Palmer

Senior Honors Theses

As research into hockey analytics continues, an increasing number of metrics are being introduced into the knowledge base of the field, creating a need to determine whether various stats are useful or simply add noise to the discussion. This paper examines microstatistics – manually tracked metrics which go beyond the NHL’s publicly released stats – both through the lens of meta-analytics (which attempt to objectively assess how useful a metric is) and modeling game probabilities. Results show that while there is certainly room for improvement in understanding and use of microstats in modeling, the metrics overall represent an area of …


A Monte Carlo Analysis Of Nonprobability Sampling & Post Hoc Corrections, Julia Hong May 2023

A Monte Carlo Analysis Of Nonprobability Sampling & Post Hoc Corrections, Julia Hong

Masters Theses & Specialist Projects

Nonprobability samples are often used in place of probability samples because the former are less trouble and less expensive. Unfortunately, it is difficult to determine how well a sample represents population parameters when using nonprobability samples. Researchers attempt to mitigate the disadvantages of nonprobability sampling by performing post hoc corrections, but this adjustment may not successfully undo the effects of nonprobability sampling. To examine these effects, a Monte Carlo simulation was conducted to create a pseudo-population from which samples were drawn. Forty-one conditions were replicated 10,000 times each, with each sample consisting of 100 observations. A post-stratification adjustment was made …


Interpretable Learning In Multivariate Big Data Analysis For Network Monitoring, José Camacho, Rasmus Bro, David Kotz Apr 2023

Interpretable Learning In Multivariate Big Data Analysis For Network Monitoring, José Camacho, Rasmus Bro, David Kotz

Dartmouth Scholarship

There is an increasing interest in the development of new data-driven models useful to assess the performance of communication networks. For many applications, like network monitoring and troubleshooting, a data model is of little use if it cannot be interpreted by a human operator. In this paper, we present an extension of the Multivariate Big Data Analysis (MBDA) methodology, a recently proposed interpretable data analysis tool. In this extension, we propose a solution to the automatic derivation of features, a cornerstone step for the application of MBDA when the amount of data is massive. The resulting network monitoring approach allows …


Modeling And Fitting Two-Way Tables Containing Outliers, David L. Farnsworth Feb 2023

Modeling And Fitting Two-Way Tables Containing Outliers, David L. Farnsworth

Articles

A model is proposed for two-way tables of measurement data containing outliers. The two independent variables are categorical and error free. Neither missing values nor replication are present. The model consists of the sum of a customary additive part that can be fit using least squares and a part that is composed of outliers. Recommendations are made for methods for identifying cells containing outliers and for fitting the model. A graph of the observations is used to determine the outliers’ locations. For all cells containing an outlier, replacement values are determined simultaneously using a classical missing-data tool. The result is …


Tennessee Brewconomy: Navigating The Wholesale Beer Tax Landscape, Lauren E. Dansbury Jan 2023

Tennessee Brewconomy: Navigating The Wholesale Beer Tax Landscape, Lauren E. Dansbury

Science University Research Symposium (SURS)

In 2013, Tennessee transitioned from a price-based wholesale tax model to a per barrelage assessment. This research delves into the repercussions of this tax reform, assessing its impact on the brewing and wholesale distribution sectors. Despite the shift, Tennessee maintains the nation's highest wholesale beer tax for 16 consecutive years. The study examines the opportunity costs associated with this elevated tax, exploring alternative uses for the funds. Utilizing data on annual revenue collected by wholesalers from 2019 to 2022, segmented by city and county, the research provides actionable insights advocating for a reduction in the wholesale tax. The argument posits …


Graphs Without A 2c3-Minor And Bicircular Matroids Without A U3,6-Minor, Daniel Slilaty Jan 2023

Graphs Without A 2c3-Minor And Bicircular Matroids Without A U3,6-Minor, Daniel Slilaty

Mathematics and Statistics Faculty Publications

In this note we characterize all graphs without a 2C3-minor. A consequence of this result is a characterization of the bicircular matroids with no U3,6-minor.


Odd Solutions To Systems Of Inequalities Coming From Regular Chain Groups, Daniel Slilaty Jan 2023

Odd Solutions To Systems Of Inequalities Coming From Regular Chain Groups, Daniel Slilaty

Mathematics and Statistics Faculty Publications

Hoffman’s theorem on feasible circulations and Ghouila-Houry’s theorem on feasible tensions are classical results of graph theory. Camion generalized these results to systems of inequalities over regular chain groups. An analogue of Camion’s result is proved in which solutions can be forced to be odd valued. The obtained result also generalizes the results of Pretzel and Youngs as well as Slilaty. It is also shown how Ghouila-Houry’s result can be used to give a new proof of the graph- coloring theorem of Minty and Vitaver.


Classification Of Adult Income Using Decision Tree, Roland Fiagbe Jan 2023

Classification Of Adult Income Using Decision Tree, Roland Fiagbe

Data Science and Data Mining

Decision tree is a commonly used data mining methodology for performing classification tasks. It is a tree-based supervised machine learning algorithm that is used to classify or make predictions in a path of how previous questions are answered. Generally, the decision tree algorithm categorizes data into branch-like segments that develop into a tree that contains a root, nodes, and leaves. This project seeks to explore the decision tree methodology and apply it to the Adult Income dataset from the UCI Machine Learning Repository, to determine whether a person makes over 50K per year and determine the necessary factors that improve …


Forecasting Remission Time Of A Treatment Method For Leukemia As An Application To Statistical Inference Approach, Ahmed Galal Atia, Mahmoud Mansour, Rashad Mohamed El-Sagheer, B. S. El-Desouky Jan 2023

Forecasting Remission Time Of A Treatment Method For Leukemia As An Application To Statistical Inference Approach, Ahmed Galal Atia, Mahmoud Mansour, Rashad Mohamed El-Sagheer, B. S. El-Desouky

Basic Science Engineering

In this paper, Weibull-Linear Exponential distribution (WLED) has been investigated whether being it is a well-fit distribution to a clinical real data. These data represent the duration of remission achieved by a certain drug used in the treatment of leukemia for a group of patients. The statistical inference approach is used to estimate the parameters of the WLED through the set of the fitted data. The estimated parameters are utilized to evaluate the survival and hazard functions and hence assessing the treatment method through forecasting the duration of remission times of patients. A two-sample prediction approach has been applied to …


Hamilton Cycles In Bidirected Complete Graphs, Arthur Busch, Mohammed A. Mutar, Daniel Slilaty Dec 2022

Hamilton Cycles In Bidirected Complete Graphs, Arthur Busch, Mohammed A. Mutar, Daniel Slilaty

Mathematics and Statistics Faculty Publications

Zaslavsky observed that the topics of directed cycles in directed graphs and alternating cycles in edge 2-colored graphs have a common generalization in the study of coherent cycles in bidirected graphs. There are classical theorems by Camion, Harary and Moser, Häggkvist and Manoussakis, and Saad which relate strong connectivity and Hamiltonicity in directed "complete" graphs and edge 2-colored "complete" graphs. We prove two analogues to these theorems for bidirected "complete" signed graphs.


Natural Language Processing For Disaster Tweets, Akinyemi D. Apampa, Nan Li Dec 2022

Natural Language Processing For Disaster Tweets, Akinyemi D. Apampa, Nan Li

Publications and Research

Our goal is to establish an automatic model that identifies which tweets are about natural disasters based on the content of the tweets. Our method is to construct a decision tree based on keyword searching. We will construct the model using 7,645 tweets and test our model on 3,465 tweets as an assessment of the performance.


Improving Data-Driven Infrastructure Degradation Forecast Skill With Stepwise Asset Condition Prediction Models, Kurt R. Lamm, Justin D. Delorit, Michael N. Grussing, Steven J. Schuldt Aug 2022

Improving Data-Driven Infrastructure Degradation Forecast Skill With Stepwise Asset Condition Prediction Models, Kurt R. Lamm, Justin D. Delorit, Michael N. Grussing, Steven J. Schuldt

Faculty Publications

Organizations with large facility and infrastructure portfolios have used asset management databases for over ten years to collect and standardize asset condition data. Decision makers use these data to predict asset degradation and expected service life, enabling prioritized maintenance, repair, and renovation actions that reduce asset life-cycle costs and achieve organizational objectives. However, these asset condition forecasts are calculated using standardized, self-correcting distribution models that rely on poorly-fit, continuous functions. This research presents four stepwise asset condition forecast models that utilize historical asset inspection data to improve prediction accuracy: (1) Slope, (2) Weighted Slope, (3) Condition-Intelligent Weighted Slope, and (4) …


A Bayesian Programming Approach To Car-Following Model Calibration And Validation Using Limited Data, Franklin Abodo Jun 2022

A Bayesian Programming Approach To Car-Following Model Calibration And Validation Using Limited Data, Franklin Abodo

FIU Electronic Theses and Dissertations

Traffic simulation software is used by transportation researchers and engineers to design and evaluate changes to roadway networks. Underlying these simulators are mathematical models of microscopic driver behavior from which macroscopic measures of flow and congestion can be recovered. Many models are intended to apply to only a subset of possible traffic scenarios and roadway configurations, while others do not have any explicit constraint on their applicability. Work zones on highways are one scenario for which no model invented to date has been shown to accurately reproduce realistic driving behavior. This makes it difficult to optimize for safety and other …


Forecasting Country Conflict Using Statistical Learning Methods, Sarah Neumann, Darryl K. Ahner, Raymond R. Hill Jun 2022

Forecasting Country Conflict Using Statistical Learning Methods, Sarah Neumann, Darryl K. Ahner, Raymond R. Hill

Faculty Publications

Purpose — This paper aims to examine whether changing the clustering of countries within a United States Combatant Command (COCOM) area of responsibility promotes improved forecasting of conflict. Design/methodology/approach — In this paper statistical learning methods are used to create new country clusters that are then used in a comparative analysis of model-based conflict prediction. Findings — In this study a reorganization of the countries assigned to specific areas of responsibility are shown to provide improvements in the ability of models to predict conflict. Research limitations/implications — The study is based on actual historical data and is purely data driven. …


Pilot Development: An Empirical Mixed-Method Analysis, Jonathan Slottje, Jason Anderson, John M. Dickens, Adam D. Reiman Jun 2022

Pilot Development: An Empirical Mixed-Method Analysis, Jonathan Slottje, Jason Anderson, John M. Dickens, Adam D. Reiman

Faculty Publications

Purpose — Pilot upgrade training is critical to aircraft and passenger safety. This study aims to identify variances in the US Air Force C-130J pilot upgrade training based on geographic location and provide a model to enhance policy that will impact future pilot training efforts that lower cost and increase operator quality and proficiency.
Design/methodology/approach This research employed a mixed-method approach. First, the authors collected data and analyzed 90 C-130J pilots' aviation records and then contextualized this analysis with interviews of experts. Finally, the authors present a modified version of Six Sigma's define–measure–analyze–improve–control (DMAIC) that identifies and reduces the …


Transportation Service Level Impact On Aircraft Availability, Vincent Mclean, Adam D. Reiman Jun 2022

Transportation Service Level Impact On Aircraft Availability, Vincent Mclean, Adam D. Reiman

Faculty Publications

Purpose — Aircraft fail to meet mission capable rate goals due to a lack of supply of aircraft parts in inventory where the aircraft breaks. This triggers an order at the repair location. To maximize mission capable rate, the time from order to delivery needs to be minimized. The purpose of this research is to examine the case of three airfields for the order to delivery time of mission critical aircraft parts for a specific aircraft type. Design/methodology/approach — This research captured data from three information systems to assess the order fulfillment process. The data were analyzed to determine the …