Open Access. Powered by Scholars. Published by Universities.®

Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistical Models

Institution
Keyword
Publication Year
Publication
Publication Type
File Type

Articles 31 - 60 of 105

Full-Text Articles in Probability

Applying The Data: Predictive Analytics In Sport, Anthony Teeter, Margo Bergman Nov 2020

Applying The Data: Predictive Analytics In Sport, Anthony Teeter, Margo Bergman

Access*: Interdisciplinary Journal of Student Research and Scholarship

The history of wagering predictions and their impact on wide reaching disciplines such as statistics and economics dates to at least the 1700’s, if not before. Predicting the outcomes of sports is a multibillion-dollar business that capitalizes on these tools but is in constant development with the addition of big data analytics methods. Sportsline.com, a popular website for fantasy sports leagues, provides odds predictions in multiple sports, produces proprietary computer models of both winning and losing teams, and provides specific point estimates. To test likely candidates for inclusion in these prediction algorithms, the authors developed a computer model, and test …


Lectures On Mathematical Computing With Python, Jay Gopalakrishnan Jul 2020

Lectures On Mathematical Computing With Python, Jay Gopalakrishnan

PDXOpen: Open Educational Resources

This open resource is a collection of class activities for use in undergraduate courses aimed at teaching mathematical computing, and computational thinking in general, using the python programming language. It was developed for a second-year course (MTH 271) revamped for a new undergraduate program in data science at Portland State University. The activities are designed to guide students' use of python modules effectively for scientific computation, data analysis, and visualization.

Adopt/Adapt
If you are an instructor adopting or adapting this open educational resource, please help us understand your use by filling out this form


Inferences For Weibull-Gamma Distribution In Presence Of Partially Accelerated Life Test, Mahmoud Mansour, M A W Mahmoud Prof., Rashad El-Sagheer Mar 2020

Inferences For Weibull-Gamma Distribution In Presence Of Partially Accelerated Life Test, Mahmoud Mansour, M A W Mahmoud Prof., Rashad El-Sagheer

Basic Science Engineering

In this paper, the point at issue is to deliberate point and interval estimations for the parameters of Weibull-Gamma distribution (WGD) using progressively Type-II censored (PROG-II-C) sample under step stress partially accelerated life test (SSPALT) model. The maximum likelihood (ML), Bayes, and four parametric bootstrap methods are used to obtain the point estimations for the distribution parameters and the acceleration factor. Furthermore, the approximate confidence intervals (ACIs), four bootstrap confidence intervals and credible intervals of the estimators have been gotten. The results of Bayes estimators are computed under the squared error loss (SEL) function using Markov Chain Monte Carlo (MCMC) …


Evaluating An Ordinal Output Using Data Modeling, Algorithmic Modeling, And Numerical Analysis, Martin Keagan Wynne Brown Jan 2020

Evaluating An Ordinal Output Using Data Modeling, Algorithmic Modeling, And Numerical Analysis, Martin Keagan Wynne Brown

Murray State Theses and Dissertations

Data and algorithmic modeling are two different approaches used in predictive analytics. The models discussed from these two approaches include the proportional odds logit model (POLR), the vector generalized linear model (VGLM), the classification and regression tree model (CART), and the random forests model (RF). Patterns in the data were analyzed using trigonometric polynomial approximations and Fast Fourier Transforms. Predictive modeling is used frequently in statistics and data science to find the relationship between the explanatory (input) variables and a response (output) variable. Both approaches prove advantageous in different cases depending on the data set. In our case, the data …


Predicting Diabetes Diagnoses, Sarah Netchert Jan 2020

Predicting Diabetes Diagnoses, Sarah Netchert

Student Research Poster Presentations 2020

This study explored the traits and health state of African Americans in central Virginia in order to determine what traits put people at a higher probability of being diagnosed with diabetes. We also want to know which traits will generate the highest probability a person will be diagnosed with diabetes. Traits that were included and used in this study were cholesterol, stabilized glucose, high density lipoprotein levels, age(years), gender, height(inches), weight(pounds), systolic blood pressure, diastolic blood pressure, waist size(inches), and hip size(inches). There were 403 individuals included in study since they were only ones screened for diabetes out of 1,046 …


Shrinkage Priors For Isotonic Probability Vectors And Binary Data Modeling, Philip S. Boonstra, Daniel R. Owen, Jian Kang Jan 2020

Shrinkage Priors For Isotonic Probability Vectors And Binary Data Modeling, Philip S. Boonstra, Daniel R. Owen, Jian Kang

The University of Michigan Department of Biostatistics Working Paper Series

This paper outlines a new class of shrinkage priors for Bayesian isotonic regression modeling a binary outcome against a predictor, where the probability of the outcome is assumed to be monotonically non-decreasing with the predictor. The predictor is categorized into a large number of groups, and the set of differences between outcome probabilities in consecutive categories is equipped with a multivariate prior having support over the set of simplexes. The Dirichlet distribution, which can be derived from a normalized cumulative sum of gamma-distributed random variables, is a natural choice of prior, but using mathematical and simulation-based arguments, we show that …


How Machine Learning And Probability Concepts Can Improve Nba Player Evaluation, Harrison Miller Jan 2020

How Machine Learning And Probability Concepts Can Improve Nba Player Evaluation, Harrison Miller

CMC Senior Theses

In this paper I will be breaking down a scholarly article, written by Sameer K. Deshpande and Shane T. Jensen, that proposed a new method to evaluate NBA players. The NBA is the highest level professional basketball league in America and stands for the National Basketball Association. They proposed to build a model that would result in how NBA players impact their teams chances of winning a game, using machine learning and probability concepts. I preface that by diving into these concepts and their mathematical backgrounds. These concepts include building a linear model using ordinary least squares method, the bias …


Using Random Forests To Estimate Win Probability Before Each Play Of An Nfl Game, Dennis Lock, Dan Nettleton Jul 2019

Using Random Forests To Estimate Win Probability Before Each Play Of An Nfl Game, Dennis Lock, Dan Nettleton

Dan Nettleton

Before any play of a National Football League (NFL) game, the probability that a given team will win depends on many situational variables (such as time remaining, yards to go for a first down, field position and current score) as well as the relative quality of the two teams as quantified by the Las Vegas point spread. We use a random forest method to combine pre-play variables to estimate Win Probability (WP) before any play of an NFL game. When a subset of NFL play-by-play data for the 12 seasons from 2001 to 2012 is used as a training dataset, …


Allocative Poisson Factorization For Computational Social Science, Aaron Schein Jul 2019

Allocative Poisson Factorization For Computational Social Science, Aaron Schein

Doctoral Dissertations

Social science data often comes in the form of high-dimensional discrete data such as categorical survey responses, social interaction records, or text. These data sets exhibit high degrees of sparsity, missingness, overdispersion, and burstiness, all of which present challenges to traditional statistical modeling techniques. The framework of Poisson factorization (PF) has emerged in recent years as a natural way to model high-dimensional discrete data sets. This framework assumes that each observed count in a data set is a Poisson random variable $y ~ Pois(\mu)$ whose rate parameter $\mu$ is a function of shared model parameters. This thesis examines a specific …


A Statistical Analysis Of The Roulette Martingale System: Examples, Formulas And Simulations With R, Peter Pflaumer May 2019

A Statistical Analysis Of The Roulette Martingale System: Examples, Formulas And Simulations With R, Peter Pflaumer

International Conference on Gambling & Risk Taking

Some gamblers use a martingale or doubling strategy as a way of improving their chances of winning. This paper derives important formulas for the martingale strategy, such as the distribution, the expected value, the standard deviation of the profit, the risk of a loss or the expected bet of one or multiple martingale rounds. A computer simulation study with R of the doubling strategy is presented. The results of doubling to gambling with a constant sized bet on simple chances (red or black numbers, even or odd numbers, and low (1 – 18) or high (19 – 36) numbers) and …


Paper Structure Formation Simulation, Tyler R. Seekins May 2019

Paper Structure Formation Simulation, Tyler R. Seekins

Electronic Theses and Dissertations

On the surface, paper appears simple, but closer inspection yields a rich collection of chaotic dynamics and random variables. Predictive simulation of paper product properties is desirable for screening candidate experiments and optimizing recipes but existing models are inadequate for practical use. We present a novel structure simulation and generation system designed to narrow the gap between mathematical model and practical prediction. Realistic inputs to the system are preserved as randomly distributed variables. Rapid fiber placement (~1 second/fiber) is achieved with probabilistic approximation of chaotic fluid dynamics and minimization of potential energy to determine flexible fiber conformations. Resulting digital packed …


Leveraging Natural Language Processing Applications And Microblogging Platform For Increased Transparency In Crisis Areas, Ernesto Carrera-Ruvalcaba, Johnson Ekedum, Austin Hancock, Ben Brock May 2019

Leveraging Natural Language Processing Applications And Microblogging Platform For Increased Transparency In Crisis Areas, Ernesto Carrera-Ruvalcaba, Johnson Ekedum, Austin Hancock, Ben Brock

SMU Data Science Review

Through microblogging applications, such as Twitter, people actively document their lives even in times of natural disasters such as hurricanes and earthquakes. While first responders and crisis-teams are able to help people who call 911, or arrive at a designated shelter, there are vast amounts of information being exchanged online via Twitter that provide real-time, location-based alerts that are going unnoticed. To effectively use this information, the Tweets must be verified for authenticity and categorized to ensure that the proper authorities can be alerted. In this paper, we create a Crisis Message Corpus from geotagged Tweets occurring during 7 hurricanes …


Predictive Distributions Via Filtered Historical Simulation For Financial Risk Management, Tyson Clark May 2019

Predictive Distributions Via Filtered Historical Simulation For Financial Risk Management, Tyson Clark

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Filtered historical simulation with an underlying GARCH process can be used as a valuable tool in VaR analysis, as it derives risk estimates that are sensitive to the distributional properties of the historical data of the produced predictive density. I examine the applications to risk analysis that filtered historical simulation can provide, as well as an interpretation of the predictive density as a poor man’s Bayesian posterior distribution. The predictive density allows us to make associated probabilistic statements regarding the results for VaR analysis, giving greater measurement of risk and the ability to maintain the optimal level of risk per …


Best Probable Subset: A New Method For Reducing Data Dimensionality In Linear Regression, Elieser Nodarse Apr 2019

Best Probable Subset: A New Method For Reducing Data Dimensionality In Linear Regression, Elieser Nodarse

FIU Electronic Theses and Dissertations

Regression is a statistical technique for modeling the relationship between a dependent variable Y and two or more predictor variables, also known as regressors. In the broad field of regression, there exists a special case in which the relationship between the dependent variable and the regressor(s) is linear. This is known as linear regression.

The purpose of this paper is to create a useful method that effectively selects a subset of regressors when dealing with high dimensional data and/or collinearity in linear regression. As the name depicts it, high dimensional data occurs when the number of predictor variables is far …


Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan Mar 2019

Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan

COBRA Preprint Series

One of the major goals in large-scale genomic studies is to identify genes with a prognostic impact on time-to-event outcomes which provide insight into the disease's process. With rapid developments in high-throughput genomic technologies in the past two decades, the scientific community is able to monitor the expression levels of tens of thousands of genes and proteins resulting in enormous data sets where the number of genomic features is far greater than the number of subjects. Methods based on univariate Cox regression are often used to select genomic features related to survival outcome; however, the Cox model assumes proportional hazards …


Non Parametric Test For Testing Exponentiality Against Exponential Better Than Used In Laplace Transform Order, Mahmoud Mansour, M A W Mahmoud Prof. Mar 2019

Non Parametric Test For Testing Exponentiality Against Exponential Better Than Used In Laplace Transform Order, Mahmoud Mansour, M A W Mahmoud Prof.

Basic Science Engineering

In this paper, the test statistic for testing exponentiality against exponential better than used in Laplace transform order (EBUL) based on the Laplace transform technique is proposed. Pitman’s asymptotic efficiency of our test is calculated and compared with other tests. The percentiles of this test are tabulated. The powers of the test are estimated for famously used distributions in aging problems. In the case of censored data, our test is applied and the percentiles are also calculated and tabulated. Finally, real examples in different areas are utilized as practical applications for the proposed test.


Modeling Stochastically Intransitive Relationships In Paired Comparison Data, Ryan Patrick Alexander Mcshane Jan 2019

Modeling Stochastically Intransitive Relationships In Paired Comparison Data, Ryan Patrick Alexander Mcshane

Statistical Science Theses and Dissertations

If the Warriors beat the Rockets and the Rockets beat the Spurs, does that mean that the Warriors are better than the Spurs? Sophisticated fans would argue that the Warriors are better by the transitive property, but could Spurs fans make a legitimate argument that their team is better despite this chain of evidence?

We first explore the nature of intransitive (rock-scissors-paper) relationships with a graph theoretic approach to the method of paired comparisons framework popularized by Kendall and Smith (1940). Then, we focus on the setting where all pairs of items, teams, players, or objects have been compared to …


Counting And Coloring Sudoku Graphs, Kyle Oddson Jan 2019

Counting And Coloring Sudoku Graphs, Kyle Oddson

Mathematics and Statistics Dissertations, Theses, and Final Project Papers

A sudoku puzzle is most commonly a 9 × 9 grid of 3 × 3 boxes wherein the puzzle player writes the numbers 1 - 9 with no repetition in any row, column, or box. We generalize the notion of the n2 × n2 sudoku grid for all n ϵ Z ≥2 and codify the empty sudoku board as a graph. In the main section of this paper we prove that sudoku boards and sudoku graphs exist for all such n we prove the equivalence of [3]'s construction using unions and products of graphs to the definition of …


An Overview And Evaluation Of Synthetc: A Statistical Model For Extra-Tropical Cyclones, Rafael Uryayev Jan 2019

An Overview And Evaluation Of Synthetc: A Statistical Model For Extra-Tropical Cyclones, Rafael Uryayev

Dissertations and Theses

Extratropical cyclones (ETCs) are the most common weather phenomena affecting the United States, Canada, and Europe. They can pose serious hazards over large swaths of area. In this thesis, a statistical model of ETCs, called SynthETC, is discussed. The model accounts for the for genesis, track path, termination, and intensity of statistically generated ETCs. Genesis is modeled as a Poisson process, whose mean is determined by climate and historical information. Tracks are modeled as a regression-mean determined by climate and historical information plus a stochastic component. Lysis is modeled using logistic regression, with climate states as covariates. Intensity is modeled …


Biodiversity And Distribution Of Benthic Foraminifera In Harrington Sound, Bermuda: The Effects Of Physical And Geochemical Factors On Dominant Taxa, Nam Le Jan 2019

Biodiversity And Distribution Of Benthic Foraminifera In Harrington Sound, Bermuda: The Effects Of Physical And Geochemical Factors On Dominant Taxa, Nam Le

Honors Theses

Harrington Sound, Bermuda, is a nearly enclosed lagoon acting as a subtropical/tropical, carbonate-rich basin in which carbonate sediments, reef patches, and carbonate-producing organisms accumulate. Here, one of the most important calcareous groups is the Foraminifera. Analyses of common benthic orders, including miliolids (Quinqueloculina and Triloculina spp.) and rotaliids (Homotrema rubrum, Elphidium spp., and Ammonia beccarii), are essential in understanding past and present environmental conditions affecting the island's coastal environment. These taxa have been studied previously; however, factors explaining their individual patterns of abundance in the Sound are not well detailed. The goal of this study is …


Statistical Investigation Of Road And Railway Hazardous Materials Transportation Safety, Amirfarrokh Iranitalab Nov 2018

Statistical Investigation Of Road And Railway Hazardous Materials Transportation Safety, Amirfarrokh Iranitalab

Department of Civil and Environmental Engineering: Dissertations, Theses, and Student Research

Transportation of hazardous materials (hazmat) in the United States (U.S.) constituted 22.8% of the total tonnage transported in 2012 with an estimated value of more than 2.3 billion dollars. As such, hazmat transportation is a significant economic activity in the U.S. However, hazmat transportation exposes people and environment to the infrequent but potentially severe consequences of incidents resulting in hazmat release. Trucks and trains carried 63.7% of the hazmat in the U.S. in 2012 and are the major foci of this dissertation. The main research objectives were 1) identification and quantification of the effects of different factors on occurrence and …


Season-Ahead Forecasting Of Water Storage And Irrigation Requirements – An Application To The Southwest Monsoon In India, Arun Ravindranath, Naresh Devineni, Upmanu Lall, Paulina Concha Larrauri Oct 2018

Season-Ahead Forecasting Of Water Storage And Irrigation Requirements – An Application To The Southwest Monsoon In India, Arun Ravindranath, Naresh Devineni, Upmanu Lall, Paulina Concha Larrauri

Publications and Research

Water risk management is a ubiquitous challenge faced by stakeholders in the water or agricultural sector. We present a methodological framework for forecasting water storage requirements and present an application of this methodology to risk assessment in India. The application focused on forecasting crop water stress for potatoes grown during the monsoon season in the Satara district of Maharashtra. Pre-season large-scale climate predictors used to forecast water stress were selected based on an exhaustive search method that evaluates for highest ranked probability skill score and lowest root-mean-squared error in a leave-one-out cross-validation mode. Adaptive forecasts were made in the years …


Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels Aug 2018

Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels

SMU Data Science Review

In this paper, we present an analysis of features influencing Yelp's proprietary review filtering algorithm. Classifying or misclassifying reviews as recommended or non-recommended affects average ratings, consumer decisions, and ultimately, business revenue. Our analysis involves systematically sampling and scraping Yelp restaurant reviews. Features are extracted from review metadata and engineered from metrics and scores generated using text classifiers and sentiment analysis. The coefficients of a multivariate logistic regression model were interpreted as quantifications of the relative importance of features in classifying reviews as recommended or non-recommended. The model classified review recommendations with an accuracy of 78%. We found that reviews …


Generalizing Multistage Partition Procedures For Two-Parameter Exponential Populations, Rui Wang Aug 2018

Generalizing Multistage Partition Procedures For Two-Parameter Exponential Populations, Rui Wang

University of New Orleans Theses and Dissertations

ANOVA analysis is a classic tool for multiple comparisons and has been widely used in numerous disciplines due to its simplicity and convenience. The ANOVA procedure is designed to test if a number of different populations are all different. This is followed by usual multiple comparison tests to rank the populations. However, the probability of selecting the best population via ANOVA procedure does not guarantee the probability to be larger than some desired prespecified level. This lack of desirability of the ANOVA procedure was overcome by researchers in early 1950's by designing experiments with the goal of selecting the best …


Distribution Of A Sum Of Random Variables When The Sample Size Is A Poisson Distribution, Mark Pfister Aug 2018

Distribution Of A Sum Of Random Variables When The Sample Size Is A Poisson Distribution, Mark Pfister

Electronic Theses and Dissertations

A probability distribution is a statistical function that describes the probability of possible outcomes in an experiment or occurrence. There are many different probability distributions that give the probability of an event happening, given some sample size n. An important question in statistics is to determine the distribution of the sum of independent random variables when the sample size n is fixed. For example, it is known that the sum of n independent Bernoulli random variables with success probability p is a Binomial distribution with parameters n and p: However, this is not true when the sample size …


Deep Learning Analysis Of Limit Order Book, Xin Xu May 2018

Deep Learning Analysis Of Limit Order Book, Xin Xu

Arts & Sciences Electronic Theses and Dissertations

In this paper, we build a deep neural network for modeling spatial structure in limit order book and make prediction for future best ask or best bid price based on ideas of (Sirignano 2016). We propose an intuitive data processing method to approximate the data is non-available for us based only on level I data that is more widely available. The model is based on the idea that there is local dependence for best ask or best bid price and sizes of related orders. First we use logistic regression to prove that this approach is reasonable. To show the advantages …


Application Of Remote Sensing And Machine Learning Modeling To Post-Wildfire Debris Flow Risks, Priscilla Addison Jan 2018

Application Of Remote Sensing And Machine Learning Modeling To Post-Wildfire Debris Flow Risks, Priscilla Addison

Dissertations, Master's Theses and Master's Reports

Historically, post-fire debris flows (DFs) have been mostly more deadly than the fires that preceded them. Fires can transform a location that had no history of DFs to one that is primed for it. Studies have found that the higher the severity of the fire, the higher the probability of DF occurrence. Due to high fatalities associated with these events, several statistical models have been developed for use as emergency decision support tools. These previous models used linear modeling approaches that produced subpar results. Our study therefore investigated the application of nonlinear machine learning modeling as an alternative. Existing models …


Comparing Various Machine Learning Statistical Methods Using Variable Differentials To Predict College Basketball, Nicholas Bennett Jan 2018

Comparing Various Machine Learning Statistical Methods Using Variable Differentials To Predict College Basketball, Nicholas Bennett

Williams Honors College, Honors Research Projects

The purpose of this Senior Honors Project is to research, study, and demonstrate newfound knowledge of various machine learning statistical techniques that are not covered in the University of Akron’s statistics major curriculum. This report will be an overview of three machine-learning methods that were used to predict NCAA Basketball results, specifically, the March Madness tournament. The variables used for these methods, models, and tests will include numerous variables kept throughout the season for each team, along with a couple variables that are used by the selection committee when tournament teams are being picked. The end goal is to find …


Some New And Generalized Distributions Via Exponentiation, Gamma And Marshall-Olkin Generators With Applications, Hameed Abiodun Jimoh Jan 2018

Some New And Generalized Distributions Via Exponentiation, Gamma And Marshall-Olkin Generators With Applications, Hameed Abiodun Jimoh

Electronic Theses and Dissertations

Three new generalized distributions developed via completing risk, gamma generator, Marshall-Olkin generator and exponentiation techniques are proposed and studied. Structural properties including quantile functions, hazard rate functions, moment, conditional moments, mean deviations, R\'enyi entropy, distribution of order statistics and maximum likelihood estimates are presented. Monte Carlo simulation is employed to examine the performance of the proposed distributions. Applications of the generalized distributions to real lifetime data are presented to illustrate the usefulness of the models.


Statistical Analysis Of Momentum In Basketball, Mackenzi Stump Dec 2017

Statistical Analysis Of Momentum In Basketball, Mackenzi Stump

Honors Projects

The “hot hand” in sports has been debated for as long as sports have been around. The debate involves whether streaks and slumps in sports are true phenomena or just simply perceptions in the mind of the human viewer. This statistical analysis of momentum in basketball analyzes the distribution of time between scoring events for the BGSU Women’s Basketball team from 2011-2017. We discuss how the distribution of time between scoring events changes with normal game factors such as location of the game, game outcome, and several other factors. If scoring events during a game were always randomly distributed, or …