Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistical Theory

Institution
Keyword
Publication Year
Publication
Publication Type
File Type

Articles 1 - 30 of 1668

Full-Text Articles in Statistics and Probability

Model Selection Through Cross-Validation For Supervised Learning Tasks With Manifold Data, Derek Brown Jan 2024

Model Selection Through Cross-Validation For Supervised Learning Tasks With Manifold Data, Derek Brown

The Journal of Purdue Undergraduate Research

No abstract provided.


Sensitivity Analysis Of Prior Distributions In Regression Model Estimation, Ayoade I Adewole, Oluwatoyin K. Bodunwa Jan 2024

Sensitivity Analysis Of Prior Distributions In Regression Model Estimation, Ayoade I Adewole, Oluwatoyin K. Bodunwa

Al-Bahir Journal for Engineering and Pure Sciences

Bayesian inferences depend solely on specification and accuracy of likelihoods and prior distributions of the observed data. The research delved into Bayesian estimation method of regression models to reduce the impact of some of the problems, posed by convectional method of estimating regression models, such as handling complex models, availability of small sample sizes and inclusion of background information in the estimation procedure. Posterior distributions are based on prior distributions and the data accuracy, which is the fundamental principles of Bayesian statistics to produce accurate final model estimates. Sensitivity analysis is an essential part of mathematical model validation in obtaining …


Machine Learning Approaches For Cyberbullying Detection, Roland Fiagbe Jan 2024

Machine Learning Approaches For Cyberbullying Detection, Roland Fiagbe

Data Science and Data Mining

Cyberbullying refers to the act of bullying using electronic means and the internet. In recent years, this act has been identifed to be a major problem among young people and even adults. It can negatively impact one’s emotions and lead to adverse outcomes like depression, anxiety, harassment, and suicide, among others. This has led to the need to employ machine learning techniques to automatically detect cyberbullying and prevent them on various social media platforms. In this study, we want to analyze the combination of some Natural Language Processing (NLP) algorithms (such as Bag-of-Words and TFIDF) with some popular machine learning …


Predicting Superconducting Critical Temperature Using Regression Analysis, Roland Fiagbe Jan 2024

Predicting Superconducting Critical Temperature Using Regression Analysis, Roland Fiagbe

Data Science and Data Mining

This project estimates a regression model to predict the superconducting critical temperature based on variables extracted from the superconductor’s chemical formula. The regression model along with the stepwise variable selection gives a reasonable and good predictive model with a lower prediction error (MSE). Variables extracted based on atomic radius, valence, atomic mass and thermal conductivity appeared to have the most contribution to the predictive model.


Microplate-Like Metal Pyrophosphate Engineered On Ni-Foam Towards Multifunctional Electrode Material For Energy Conversion And Storage, Rishabh Srivastava Dec 2023

Microplate-Like Metal Pyrophosphate Engineered On Ni-Foam Towards Multifunctional Electrode Material For Energy Conversion And Storage, Rishabh Srivastava

Electronic Theses & Dissertations

High clean energy demand, dire need for sustainable development, and low carbon footprints are the few intuitive challenges, leading researchers to aim for research and development for high-performance energy devices. The development of materials used in energy devices is currently focused on enhancing the performance, electronic properties, and durability of devices. Tunning the attributes of transition metals using pyrophosphate (P2O7) ligand moieties can be a promising approach to meet the requirements of energy devices such as water electrolyzers and supercapacitors, although such a material’s configuration is rarely exposed for this purpose of study.

Herein, we grow …


Exploration And Statistical Modeling Of Profit, Caleb Gibson Dec 2023

Exploration And Statistical Modeling Of Profit, Caleb Gibson

Undergraduate Honors Theses

For any company involved in sales, maximization of profit is the driving force that guides all decision-making. Many factors can influence how profitable a company can be, including external factors like changes in inflation or consumer demand or internal factors like pricing and product cost. Understanding specific trends in one's own internal data, a company can readily identify problem areas or potential growth opportunities to help increase profitability.

In this discussion, we use an extensive data set to examine how a company might analyze their own data to identify potential changes the company might investigate to drive better performance. Based …


The Private Pilot Check Ride: Applying The Spacing Effect Theory To Predict Time To Proficiency For The Practical Test, Michael Scott Harwin Dec 2023

The Private Pilot Check Ride: Applying The Spacing Effect Theory To Predict Time To Proficiency For The Practical Test, Michael Scott Harwin

Theses and Dissertations

This study examined the relationship between a set of targeted factors and the total flight time students needed to become ready to take the private pilot check ride. The study was grounded in Ebbinghaus’s (1885/1913/2013) forgetting curve theory and spacing effect, and Ausubel’s (1963) theory of meaningful learning. The research factors included (a) training time to proficiency, which represented the number of training days needed to become check-ride ready; (b) flight training program (Part 61 vs. Part 141); (c) organization offering the training program (2- or 4-year college/university vs. FBO); (d) scheduling policy (mandated vs. student-driven); and demographical variables, which …


Nonparametric Derivative Estimation Using Penalized Splines: Theory And Application, Bright Antwi Boasiako Nov 2023

Nonparametric Derivative Estimation Using Penalized Splines: Theory And Application, Bright Antwi Boasiako

Doctoral Dissertations

This dissertation is in the field of Nonparametric Derivative Estimation using
Penalized Splines. It is conducted in two parts. In the first part, we study the L2
convergence rates of estimating derivatives of mean regression functions using penalized splines. In 1982, Stone provided the optimal rates of convergence for estimating derivatives of mean regression functions using nonparametric methods. Using these rates, Zhou et. al. in their 2000 paper showed that the MSE of derivative estimators based on regression splines approach zero at the optimal rate of convergence. Also, in 2019, Xiao showed that, under some general conditions, penalized spline estimators …


Statistical Inference On Lung Cancer Screening Using The National Lung Screening Trial Data., Farhin Rahman Aug 2023

Statistical Inference On Lung Cancer Screening Using The National Lung Screening Trial Data., Farhin Rahman

Electronic Theses and Dissertations

This dissertation consists of three research projects on cancer screening probability modeling. In these projects, the three key modeling parameters (sensitivity, sojourn time, transition density) for cancer screening were estimated, along with the long-term outcomes (including overdiagnosis as one outcome), the optimal screening time/age, the lead time distribution, and the probability of overdiagnosis at the future screening time were simulated to provide a statistical perspective on the effectiveness of cancer screening programs. In the first part of this dissertation, a statistical inference was conducted for male and female smokers using the National Lung Screening Trial (NLST) chest X-ray data. A …


A Comparison Of Confidence Intervals In State Space Models, Jinyu Du Jul 2023

A Comparison Of Confidence Intervals In State Space Models, Jinyu Du

Statistical Science Theses and Dissertations

This thesis develops general procedures for constructing confidence intervals (CIs) of the error disturbance parameters (standard deviations) and transformations of the error disturbance parameters in time-invariant state space models (ssm). With only a set of observations, estimating individual error disturbance parameters accurately in the presence of other unknown parameters in ssm is a very challenging problem. We attempted to construct four different types of confidence intervals, Wald, likelihood ratio, score, and higher-order asymptotic intervals for both the simple local level model and the general time-invariant state space models (ssm). We show that for a simple local level model, both the …


Addressing The Impact Of Time-Dependent Social Groupings On Animal Survival And Recapture Rates In Mark-Recapture Studies, Alexandru M. Draghici Jun 2023

Addressing The Impact Of Time-Dependent Social Groupings On Animal Survival And Recapture Rates In Mark-Recapture Studies, Alexandru M. Draghici

Electronic Thesis and Dissertation Repository

Mark-recapture (MR) models typically assume that individuals under study have independent survival and recapture outcomes. One such model of interest is known as the Cormack-Jolly-Seber (CJS) model. In this dissertation, we conduct three major research projects focused on studying the impact of violating the independence assumption in MR models along with presenting extensions which relax the independence assumption. In the first project, we conduct a simulation study to address the impact of failing to account for pair-bonded animals having correlated recapture and survival fates on the CJS model. We examined the impact of correlation on the likelihood ratio test (LRT), …


Testing For Dice Control Based On Observations Of The Length Of The Shooter's Hand, Stewart N. Ethier, Hokwon Cho May 2023

Testing For Dice Control Based On Observations Of The Length Of The Shooter's Hand, Stewart N. Ethier, Hokwon Cho

International Conference on Gambling & Risk Taking

uploaded


Uconn Baseball Batting Order Optimization, Gavin Rublewski, Gavin Rublewski May 2023

Uconn Baseball Batting Order Optimization, Gavin Rublewski, Gavin Rublewski

Honors Scholar Theses

Challenging conventional wisdom is at the very core of baseball analytics. Using data and statistical analysis, the sets of rules by which coaches make decisions can be justified, or possibly refuted. One of those sets of rules relates to the construction of a batting order. Through data collection, data adjustment, the construction of a baseball simulator, and the use of a Monte Carlo Simulation, I have assessed thousands of possible batting orders to determine the roster-specific strategies that lead to optimal run production for the 2023 UConn baseball team. This paper details a repeatable process in which basic player statistics …


Two Sample Statistical Test For Location Parameters, Narinder Kumar, Arun Kumar Apr 2023

Two Sample Statistical Test For Location Parameters, Narinder Kumar, Arun Kumar

Journal of Modern Applied Statistical Methods

A class of distribution-free tests for the homogeneity of location parameters is proposed and compared with different competitors in terms of Pitman asymptotic relative efficiency. A numerical example is provided and a simulation study is made to check the performance of the tests.


High-Dimensional Variable Selection Via Knockoffs Using Gradient Boosting, Amr Essam Mohamed Apr 2023

High-Dimensional Variable Selection Via Knockoffs Using Gradient Boosting, Amr Essam Mohamed

Dissertations

As data continue to grow rapidly in size and complexity, efficient and effective statistical methods are needed to detect the important variables/features. Variable selection is one of the most crucial problems in statistical applications. This problem arises when one wants to model the relationship between the response and the predictors. The goal is to reduce the number of variables to a minimal set of explanatory variables that are truly associated with the response of interest to improve the model accuracy. Effectively choosing the true influential variables and controlling the False Discovery Rate (FDR) without sacrificing power has been a challenge …


High Dimensional Data Analysis: Variable Screening And Inference, Lei Fang Jan 2023

High Dimensional Data Analysis: Variable Screening And Inference, Lei Fang

Theses and Dissertations--Statistics

This dissertation focuses on the problem of high dimensional data analysis, which arises in many fields including genomics, finance, and social sciences. In such settings, the number of features or variables is much larger than the number of observations, posing significant challenges to traditional statistical methods.

To address these challenges, this dissertation proposes novel methods for variable screening and inference. The first part of the dissertation focuses on variable screening, which aims to identify a subset of important variables that are strongly associated with the response variable. Specifically, we propose a robust nonparametric screening method to effectively select the predictors …


Bayesian Methods For Graphical Models With Neighborhood Selection., Sagnik Bhadury Dec 2022

Bayesian Methods For Graphical Models With Neighborhood Selection., Sagnik Bhadury

Electronic Theses and Dissertations

Graphical models determine associations between variables through the notion of conditional independence. Gaussian graphical models are a widely used class of such models, where the relationships are formalized by non-null entries of the precision matrix. However, in high-dimensional cases, covariance estimates are typically unstable. Moreover, it is natural to expect only a few significant associations to be present in many realistic applications. This necessitates the injection of sparsity techniques into the estimation method. Classical frequentist methods, like GLASSO, use penalization techniques for this purpose. Fully Bayesian methods, on the contrary, are slow because they require iteratively sampling over a quadratic …


Mle And Eap Methods For Estimating Ability Scores For Data Of Varying Sample Size And Item Length, Sahar Taji Dec 2022

Mle And Eap Methods For Estimating Ability Scores For Data Of Varying Sample Size And Item Length, Sahar Taji

Graduate Theses and Dissertations

In this research, the performance of two popular estimators, Maximum Likelihood Estimator(MLE) and Bayesian Expected a Posteriori (EAP) is studied and compared in estimating the latent ability score in an Item Response Theory (IRT) model. The 2-Parameter Logistic (2PL) IRT model which is characterized by difficulty and discrimination item parameters is used to estimate the latent ability scores. Several datasets are generated for variety of sample size and item length values. The Monte-Carlo simulation is used to analyze the performance of the estimators. Results show that MLE produces reliable results with low root mean square error (RMSE) across all datasets. …


Functional Data Analysis Of Covid-19, Nichole L. Fluke Nov 2022

Functional Data Analysis Of Covid-19, Nichole L. Fluke

Mathematics & Statistics ETDs

This thesis deals with Functional Data Analysis (FDA) on COVID data. The Data involves counts for new COVID cases, hospitalized COVID patients, and new COVID deaths. The data used is for all the states and regions in the United States. The data starts in March 1st, 2020 and goes through March 31st, 2021. The FDA smooths the data and looks to see if there are similarities or differences between the states and regions in the data. The data also shows which states and regions stand out from the others and which ones are similar. Also shown …


Statistical Roles Of The G-Expectation Framework In Model Uncertainty: The Semi-G-Structure As A Stepping Stone, Yifan Li Oct 2022

Statistical Roles Of The G-Expectation Framework In Model Uncertainty: The Semi-G-Structure As A Stepping Stone, Yifan Li

Electronic Thesis and Dissertation Repository

The G-expectation framework is a generalization of the classical probability system based on the sublinear expectation to deal with phenomena that cannot be described by a single probabilistic model. These phenomena are closely related to the long-existing concern about model uncertainty in statistics. However, the distributions and independence in the G-framework are quite different from the classical setup. These distinctions bring difficulty when applying the idea of this framework to general statistical practice. Therefore, a fundamental and unavoidable problem is how to better understand G-version concepts from a statistical perspective.

To explore this problem, this thesis establishes a new substructure …


Bayesian Estimation Of The Intensity Function Of A Non-Homogeneous Poisson Process, James Jensen Oct 2022

Bayesian Estimation Of The Intensity Function Of A Non-Homogeneous Poisson Process, James Jensen

Theses

In this paper we explore Bayesian inference and its application to the problem of estimating the intensity function of a non-homogeneous Poisson process. These processes model the behavior of phenomena in which one or more events, known as arrivals, occur independently of one another over a certain period of time. We are concerned with the number of events occurring during particular time intervals across several realizations of the process. We show that given sufficient data, we are able to construct a piecewise-constant function which accurately estimates the mean rates on particular intervals. Further, we show that as we reduce these …


The Q-Analogue Of The Extended Generalized Gamma Distribution, Wenhao Chen Aug 2022

The Q-Analogue Of The Extended Generalized Gamma Distribution, Wenhao Chen

Undergraduate Student Research Internships Conference

This project introduces a flexible univariate probability model referred to as the q-analogue of the Extended Generalized Gamma (or q-EGG) distribution, which encompasses the majority of the most frequently used continuous distributions, including the gamma, Weibull, logistic, type-1 and type-2 beta, Gaussian, Cauchy, Student-t and F. Closed form representations of its moments and cumulative distribution function are provided. Additionally, computational techniques are proposed for determining estimates of its parameters. Both the method of moments and the maximum likelihood approach are utilized. The effect of each parameter is also graphically illustrated. Certain data sets are modeled with q-EGG distributions; goodness of …


To Logit Or Not To Logit Data In The Unit Interval: A Simulation Study, Kayode Idris Hamzat Aug 2022

To Logit Or Not To Logit Data In The Unit Interval: A Simulation Study, Kayode Idris Hamzat

Major Papers

In this paper, we recommend a mechanism for determining whether to logit or not to logit data in the unit interval which is based on quantile estimation of data between 0 and 1. By using a simulated dataset generated from a Beta regression model, the estimated quantile for this model perform better than those based on the linear quantile regression with logit transformation.

Further, we investigate the performance of the quantile regression estimators based on the LQR and we conclude that it is better than those based on the Beta regression when the distribution is contaminated with 10% uniform numbers …


New Developments On The Estimability And The Estimation Of Phase-Type Actuarial Models, Cong Nie Jul 2022

New Developments On The Estimability And The Estimation Of Phase-Type Actuarial Models, Cong Nie

Electronic Thesis and Dissertation Repository

This thesis studies the estimability and the estimation methods for two models based on Markov processes: the phase-type aging model (PTAM), which models the human aging process, and the discrete multivariate phase-type model (DMPTM), which can be used to model multivariate insurance claim processes.

The principal contributions of this thesis can be categorized into two areas. First, an objective measure of estimability is proposed to quantify estimability in the context of statistical models. Existing methods for assessing estimability require the subjective specification of thresholds, which potentially limits their usefulness. Unlike these methods, the proposed measure of estimability is objective. In …


How Blockchain Solutions Enable Better Decision Making Through Blockchain Analytics, Sammy Ter Haar May 2022

How Blockchain Solutions Enable Better Decision Making Through Blockchain Analytics, Sammy Ter Haar

Information Systems Undergraduate Honors Theses

Since the founding of computers, data scientists have been able to engineer devices that increase individuals’ opportunities to communicate with each other. In the 1990s, the internet took over with many people not understanding its utility. Flash forward 30 years, and we cannot live without our connection to the internet. The internet of information is what we called early adopters with individuals posting blogs for others to read, this was known as Web 1.0. As we progress, platforms became social allowing individuals in different areas to communicate and engage with each other, this was known as Web 2.0. As Dr. …


Advancements In Gaussian Process Learning For Uncertainty Quantification, John C. Nicholson May 2022

Advancements In Gaussian Process Learning For Uncertainty Quantification, John C. Nicholson

All Dissertations

Gaussian processes are among the most useful tools in modeling continuous processes in machine learning and statistics. The research presented provides advancements in uncertainty quantification using Gaussian processes from two distinct perspectives. The first provides a more fundamental means of constructing Gaussian processes which take on arbitrary linear operator constraints in much more general framework than its predecessors, and the other from the perspective of calibration of state-aware parameters in computer models. If the value of a process is known at a finite collection of points, one may use Gaussian processes to construct a surface which interpolates these values to …


Aberrant Responding With Underlying Dominance And Unfolding Response Processes: Examining Model Fit And Performance Of Person-Fit Statistics, Jennifer A. Reimers May 2022

Aberrant Responding With Underlying Dominance And Unfolding Response Processes: Examining Model Fit And Performance Of Person-Fit Statistics, Jennifer A. Reimers

Graduate Theses and Dissertations

Researchers have recognized that respondents may not answer items in a way that accurately reflects their attitude or trait level being measured. The resulting response data that deviates from what would be expected has been shown to have significant effects on the psychometric properties of a scale and analytical results. However, many studies that have investigated the detection of aberrant data and its effects have done so using dominance item response theory (IRT) models. It is unknown whether the impacts of aberrant data and the methodology used to identify aberrant responding when using dominance IRT models apply similarly when scales …


On Misuses Of The Kolmogorov–Smirnov Test For One-Sample Goodness-Of-Fit, Anthony Zeimbekakis Apr 2022

On Misuses Of The Kolmogorov–Smirnov Test For One-Sample Goodness-Of-Fit, Anthony Zeimbekakis

Honors Scholar Theses

The Kolmogorov–Smirnov (KS) test is one of the most popular goodness-of-fit tests for comparing a sample with a hypothesized parametric distribution. Nevertheless, it has often been misused. The standard one-sample KS test applies to independent, continuous data with a hypothesized distribution that is completely specified. It is not uncommon, however, to see in the literature that it was applied to dependent, discrete, or rounded data, with hypothesized distributions containing estimated parameters. For example, it has been "discovered" multiple times that the test is too conservative when the parameters are estimated. We demonstrate misuses of the one-sample KS test in three …


Parametric And Reliability Estimation Of The Kumaraswamy Generalized Distribution Based On Record Values, Mohd. Arshad, Qazi J. Azhad Jan 2022

Parametric And Reliability Estimation Of The Kumaraswamy Generalized Distribution Based On Record Values, Mohd. Arshad, Qazi J. Azhad

Journal of Modern Applied Statistical Methods

A general family of distributions, namely Kumaraswamy generalized family of (Kw-G) distribution, is considered for estimation of the unknown parameters and reliability function based on record data from Kw-G distribution. The maximum likelihood estimators (MLEs) are derived for unknown parameters and reliability function, along with its confidence intervals. A Bayesian study is carried out under symmetric and asymmetric loss functions in order to find the Bayes estimators for unknown parameters and reliability function. Future record values are predicted using Bayesian approach and non Bayesian approach, based on numerical examples and a monte carlo simulation.


Does The Type Of Records Affect The Estimates Of The Parameters?, Ayush Tripathi, Umesh Singh, Sanjay Kumar Singh Jan 2022

Does The Type Of Records Affect The Estimates Of The Parameters?, Ayush Tripathi, Umesh Singh, Sanjay Kumar Singh

Journal of Modern Applied Statistical Methods

The maximum likelihood estimation of the unknown parameters of inverse Rayleigh and exponential distributions are discussed based on lower and upper records. The aim is to study the effect of the type of records on the behavior of the corresponding estimators. Mean squared errors are calculated through simulation to study the behavior of the estimators. The results shall be of interest to those situations where the data can be obtained in the form of either of the two types of records and the experimenter must decide between these two for estimation of the unknown parameters of the distribution.