Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

12,030 Full-Text Articles 18,559 Authors 5,957,324 Downloads 277 Institutions

All Articles in Statistics and Probability

Faceted Search

12,030 full-text articles. Page 1 of 404.

Two-Stage Approach For Forensic Handwriting Analysis, Ashlan J. Simpson, Danica M. Ommen 2023 Iowa State University

Two-Stage Approach For Forensic Handwriting Analysis, Ashlan J. Simpson, Danica M. Ommen

SDSU Data Science Symposium

Trained experts currently perform the handwriting analysis required in the criminal justice field, but this can create biases, delays, and expenses, leaving room for improvement. Prior research has sought to address this by analyzing handwriting through feature-based and score-based likelihood ratios for assessing evidence within a probabilistic framework. However, error rates are not well defined within this framework, making it difficult to evaluate the method and can lead to making a greater-than-expected number of errors when applying the approach. This research explores a method for assessing handwriting within the Two-Stage framework, which allows for quantifying error rates as recommended by …


Biasing Estimator To Mitigate Multicollinearity In Linear Regression Model, Abdulrasheed Bello Badawaire, Issam Dawoud, Adewale Folaranmi Lukman, Victoria Laoye, Arowolo Olatunji 2023 Department of Mathematics and Statistics, Federal University Wukari, Wukari, Nigeria

Biasing Estimator To Mitigate Multicollinearity In Linear Regression Model, Abdulrasheed Bello Badawaire, Issam Dawoud, Adewale Folaranmi Lukman, Victoria Laoye, Arowolo Olatunji

Al-Bahir Journal for Engineering and Pure Sciences

A new two-parameter estimator was developed to combat the threat of multicollinearity for the linear regression model. Some necessary and sufficient conditions for the dominance of the proposed estimator over ordinary least squares (OLS) estimator, ridge regression estimator, Liu estimator, KL estimator, and some two-parameter estimators are obtained in the matrix mean square error sense. Theory and simulation results show that, under some conditions, the proposed two-parameter estimator consistently dominates other estimators considered in this study. The real-life application result follows suit.


A Statistical Analysis Of The Change In Age Distribution Of Spawning Hatchery Salmon, Rachel Macaulay, Emily Barrett, Grace Penunuri, Eli E. Goldwyn 2023 University of Portland

A Statistical Analysis Of The Change In Age Distribution Of Spawning Hatchery Salmon, Rachel Macaulay, Emily Barrett, Grace Penunuri, Eli E. Goldwyn

Spora: A Journal of Biomathematics

Declines in salmon sizes have been reported primarily as a result of younger maturation rates. This change in age distribution poses serious threats to salmon-dependent peoples and ecological systems. We perform a statistical analysis to examine the change in age structure of spawning Alaskan chum salmon Oncorhynchus keta and Chinook salmon O. tshawytscha using 30 years of hatchery data. To highlight the impacts of this change, we investigate the average number of fry/smolt that each age of spawning chum/Chinook salmon produce. Our findings demonstrate an increase in younger hatchery salmon populations returning to spawn, and fewer amounts of fry produced …


Beyond Statistical Significance: A Holistic View Of What Makes A Research Finding "Important", Jane E. Miller 2023 Rutgers, The State University of New Jersey

Beyond Statistical Significance: A Holistic View Of What Makes A Research Finding "Important", Jane E. Miller

Numeracy

Students often believe that statistical significance is the only determinant of whether a quantitative result is “important.” In this paper, I review traditional null hypothesis statistical testing to identify what questions inferential statistics can and cannot answer, including statistical significance, effect size and direction, causality, generalizability, and changeability of the independent variable. I illustrate these issues with examples from an empirical study of the association between how much time teenagers spent playing video games and time spent reading. I describe how study design and context determine each of those aspects of “importance,” and close by summarizing how to provide a …


Establishing The Validity And Reliability Of The Locus Assessments, Tim Jacobbe, Bob delMas, Brad Hartlaub, Jeff Haberstroh, Catherine Case, Steven Foti, Douglas Whitaker 2023 Southern Methodist University

Establishing The Validity And Reliability Of The Locus Assessments, Tim Jacobbe, Bob Delmas, Brad Hartlaub, Jeff Haberstroh, Catherine Case, Steven Foti, Douglas Whitaker

Numeracy

The development of assessments as part of the funded LOCUS project is described. The assessments measure students’ conceptual understanding of statistics as outlined in the GAISE PreK–12 Framework. Results are reported from a large-scale administration to 3,430 students in grades 6 through 12 in the United States. Items were designed to assess levels of understanding as well as components of the statistical problem solving process as articulated in the GAISE framework. We discuss details of how the model used to develop the LOCUS assessments guided the gathering of evidence for validity and reliability arguments. Three types of validity evidence are …


Supplementary Files For "Adaptive Mapping Of Design Ground Snow Loads In The Conterminous United States", Jadon Wagstaff, Jesse Wheeler, Brennan Bean, Marc Maguire, Yan Sun 2023 University of Utah

Supplementary Files For "Adaptive Mapping Of Design Ground Snow Loads In The Conterminous United States", Jadon Wagstaff, Jesse Wheeler, Brennan Bean, Marc Maguire, Yan Sun

Browse all Datasets

Recent amendments to design ground snow load requirements in ASCE 7-22 have reduced the size of case study regions by 91% from what they were in ASCE 7-16, primarily in western states. This reduction is made possible through the development of highly accurate regional generalized additive regression models (RGAMs), stitched together with a novel smoothing scheme implemented in the R software package remap, to produce the continental- scale maps of reliability-targeted design ground snow loads available in ASCE 7-22. This approach allows for better characterizations of the changing relationship between temperature, elevation, and ground snow loads across the Conterminous United …


On Partially Observed Tensor Regression, Dinara Miftyakhetdinova 2023 University of Windsor

On Partially Observed Tensor Regression, Dinara Miftyakhetdinova

Major Papers

Tensor data is widely used in modern data science. The interest lies in identifying and characterizing the relationship between tensor datasets and external covariates. These datasets, though, are often incomplete. An efficient nonconvex alternating updating algorithm proposed by J. Zhou et al. in the paper "Partially Observed Dynamic Tensor Response Regression" provides a novel approach. The algorithm handles the problem of unobserved entries by solving an optimization problem of a loss function under the low-rankness, sparsity, and fusion constraints. This analysis aims to understand in detail the proposed algorithms and their theoretical proofs with, potentially, dropping some of the assumptions …


Informative Hypothesis For Group Means Comparison, Dr. Teck Kiang Tan 2023 National University of Singapore

Informative Hypothesis For Group Means Comparison, Dr. Teck Kiang Tan

Practical Assessment, Research, and Evaluation

Researchers often have hypotheses concerning the state of affairs in the population from which they sampled their data to compare group means. The classical frequentist approach provides one way of carrying out hypothesis testing using ANOVA to state the null hypothesis that there is no difference in the means and proceed with multiple comparisons if the null hypothesis is rejected. As this approach is not able to incorporate order, inequality, and direction into hypothesis testing, and neither does it able to specify multiple hypotheses, this paper introduces the informative hypothesis that allows more flexibility in stating hypothesis testing and is …


Uniformity Test Based On The Empirical Bernstein Distribution, Ran Sun 2023 University of Windsor

Uniformity Test Based On The Empirical Bernstein Distribution, Ran Sun

Major Papers

In this paper, we firstly review the origin of Bernstein polynomial and the various application of it. Then we review the importance of goodness-of-fit test, especially the uniformity test, and we examine lots of different test statistics proposed by far. After that we suggest two new statistics for testing the uniformity. These two statistics are based on Komogorov-Smirnov test type and Cramér-Von Mises test type, respectively. Also we embed Bernstein polynomial into those test type and take advantage of great approximation performance of this polynomial. Finally, we run a Monte-Carlo simulation to compare the performance of our statistics to those …


Optimal Speed Of A Machine In An Assembly Line Using The Continuous Time Markov Chain Rate Matrix, Chandi Darshani Rupasinghe 2023 University of Windsor

Optimal Speed Of A Machine In An Assembly Line Using The Continuous Time Markov Chain Rate Matrix, Chandi Darshani Rupasinghe

Major Papers

The optimal speed of a machine in an assembly line is determined using a Markov decision process type model. We develop the rate matrix that represents the inter-event time of a machine, either repair time or time to breakdown, as a function of speed. We consider the rate of time to breakdown with a variety of functions of speed. We find limiting probabilities and express profit in terms of these probabilities. We then find the optimal speed to maximize profit. Further, we assume an underlying function of speed and simulate data using R. From the simulated data, we estimate the …


Application Of Sentiment Analysis And Machine Learning Techniques To Predict Daily Cryptocurrency Price Returns, Edward Wu 2023 Claremont Colleges

Application Of Sentiment Analysis And Machine Learning Techniques To Predict Daily Cryptocurrency Price Returns, Edward Wu

CMC Senior Theses

This paper examines the effects of social media sentiment relating to Bitcoin on the daily price returns of Bitcoin and other popular cryptocurrencies by utilizing sentiment analysis and machine learning techniques to predict daily price returns. Many investors think that social media sentiment affects cryptocurrency prices. However, the results of this paper find that social media sentiment relating to Bitcoin does not add significant predictive value to forecasting daily price returns for each of the six cryptocurrencies used for analysis and that machine learning models that do not assume linearity between the current day price return and previous daily price …


Statistical Models For Decision-Making In Professional Soccer, Sean Hellingman 2023 Wilfrid Laurier University

Statistical Models For Decision-Making In Professional Soccer, Sean Hellingman

Theses and Dissertations (Comprehensive)

As soccer is widely regarded as the most popular sport in the world there is high interest in methods of improving team performances. There are many ways teams and individual athletes can influence their own performances during competition. This thesis focuses on developing statistical methodologies for improving competition-based decision-making for soccer so as to allow professional soccer teams to make better informed decisions regarding player selection and in-game decision-making.

To properly capture the dynamic actions of professional soccer, Markov chains with increasing complexity are proposed. These models allow for the inclusion of potential changes in the process caused by goals …


Medical Racism: Comparing Prenatal Care Across Races In The United States, Rubina Cheema 2022 DePauw University

Medical Racism: Comparing Prenatal Care Across Races In The United States, Rubina Cheema

Student Research

Prenatal care describes any care a woman receives during her pregnancy. It is intended to keep both the mother and the child healthy and also to reduce the risk of complications during and after birth. This care is especially important for women with high-risk factors so that doctors and nurses are able to monitor their health and the health of their baby during the duration of their pregnancy. For prenatal care to be most effective, it is imperative to begin prenatal care within the first trimester of a woman's pregnancy. However, in the United States, medical racism creates a major …


Study On Innovation Networks And Its Spillover Effect Of China’S New Energy Automobile Industry, Zhifei XIONG, Wenzhong ZHANG 2022 Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China

Study On Innovation Networks And Its Spillover Effect Of China’S New Energy Automobile Industry, Zhifei Xiong, Wenzhong Zhang

Bulletin of Chinese Academy of Sciences (Chinese Version)

The network spillover effect of knowledge has been playing an increasingly significant role in the development of industrial innovation. The urban cooperation matrix of China’s new energy automobile industry is built based on new energy automobile patent data, and the structure and evolution process of China’s new energy automobile industry are depicted. On this basis, the spatial Dubin model (SDM) is used to calculate the network spillover effect, and its results are compared with the results of spillover effect based on the relationship of spatial contiguity and distance of cities. The results show that the innovation activities of China’s new …


Regression Modeling Of Complex Survival Data Based On Pseudo-Observations, Rong Rong 2022 Southern Methodist University

Regression Modeling Of Complex Survival Data Based On Pseudo-Observations, Rong Rong

Statistical Science Theses and Dissertations

The restricted mean survival time (RMST) is a clinically meaningful summary measure in studies with survival outcomes. Statistical methods have been developed for regression analysis of RMST to investigate impacts of covariates on RMST, which is a useful alternative to the Cox regression analysis. However, existing methods for regression modeling of RMST are not applicable to left-truncated right-censored data that arise frequently in prevalent cohort studies, for which the sampling bias due to left truncation and informative censoring induced by the prevalent sampling scheme must be properly addressed. Meanwhile, statistical methods have been developed for regression modeling of the cumulative …


Kernel Estimation Of Spot Volatility And Its Application In Volatility Functional Estimation, Bei Wu 2022 Washington University in St. Louis

Kernel Estimation Of Spot Volatility And Its Application In Volatility Functional Estimation, Bei Wu

Arts & Sciences Electronic Theses and Dissertations

It\^o semimartingale models for the dynamics of asset returns have been widely studied in financial econometrics. A key component of the model, spot volatility, plays a crucial role in option pricing, portfolio management, and financial risk assessment. In this dissertation, we consider three problems related to the estimation of spot volatility using high-frequency asset returns. We first revisit the problem of estimating the spot volatility of an It\^o semimartingale using a kernel estimator. We prove a Central Limit Theorem with an optimal convergence rate for a general two-sided kernel under quite mild assumptions, which includes leverage effects and jumps of …


Contribution To Data Science: Time Series, Uncertainty Quantification And Applications, Dhrubajyoti Ghosh 2022 Washington University in St. Louis

Contribution To Data Science: Time Series, Uncertainty Quantification And Applications, Dhrubajyoti Ghosh

Arts & Sciences Electronic Theses and Dissertations

Time series analysis is an essential tool in modern world statistical analysis, with a myriad of real data problems having temporal components that need to be studied to gain a better understanding of the temporal dependence structure in the data. For example, in the stock market, it is of significant importance to identify the ups and downs of the stock prices, for which time series analysis is crucial. Most of the existing literature on time series deals with linear time series, or with Gaussianity assumption. However, there are multiple instances where the time series shows nonlinear trends, or when the …


Dealing With Dimensionality: Problems And Techniques In High-Dimensional Statistics, Cezareo Rodriguez 2022 Washington University in St. Louis

Dealing With Dimensionality: Problems And Techniques In High-Dimensional Statistics, Cezareo Rodriguez

Arts & Sciences Electronic Theses and Dissertations

In modern data analysis, problems involving high dimensional data with more variables than subjects is increasingly common. Two such cases are mediation analysis and distributed optimization. In Chapter 2 we start with an overview of high dimensional statistics and mediation analysis. In Chapter 3 we motivate and prove properties for a new marginal screening procedure for performing high dimensional mediation analysis. This screening procedure is shown via simulation to perform better than benchmark approaches and is applied to a DNA methylation study. In Chapter 4 we construct a cryptosystem that accurately performs distributed penalized quantile regression in the high-dimensional setting …


Predictors Of Covid-19 Vaccination Rate In Usa: A Machine Learning Approach, Syed M. I. Osman, Ahmed Sabit 2022 Sacred Heart University

Predictors Of Covid-19 Vaccination Rate In Usa: A Machine Learning Approach, Syed M. I. Osman, Ahmed Sabit

WCBT Faculty Publications

In this study, we examine state-level features and policies that are most important in achieving a threshold level vaccination rate to curve the effects of the COVID-19 pandemic. We employ CHAID, a decision tree algorithm, on three different model specifications to answer this question based on a dataset that includes all the states in the United States. Workplace travel emerges as the most important predictor; however, the governors’ political affiliation (PA) replaces it in a more conservative feature set that includes economic features and the growth rate of COVID-19 cases. We also employ several alternative algorithms as a robustness check. …


Examining The Impact Of Covid-19 On The Education And Development Of American Students, Riley Fortin '25 2022 DePauw University

Examining The Impact Of Covid-19 On The Education And Development Of American Students, Riley Fortin '25

Student Research

After the COVID-19 pandemic, the vast majority of American children have fallen behind on core subjects due to the ultimate ineffectiveness of remote learning. This study attempts to discover the degree to which children have fallen behind through the trends in the National Association of Educational Procurement’s two most recent testing years. A database accessed from Google has been analyzed, filtered by state and visualized in tables in order to indicate any possible trends as a result of remote learning brought on by the pandemic. By looking at data in seven different states across the country, there is a notable …


Digital Commons powered by bepress