Open Access. Powered by Scholars. Published by Universities.®

Statistical Models

Institution
Keyword
Publication Year
Publication
Publication Type
File Type

Articles 1 - 30 of 72

Full-Text Articles in Other Statistics and Probability

Reducing Food Scarcity: The Benefits Of Urban Farming, S.A. Claudell, Emilio Mejia Dec 2023

Reducing Food Scarcity: The Benefits Of Urban Farming, S.A. Claudell, Emilio Mejia

Journal of Nonprofit Innovation

Urban farming can enhance the lives of communities and help reduce food scarcity. This paper presents a conceptual prototype of an efficient urban farming community that can be scaled for a single apartment building or an entire community across all global geoeconomics regions, including densely populated cities and rural, developing towns and communities. When deployed in coordination with smart crop choices, local farm support, and efficient transportation then the result isn’t just sustainability, but also increasing fresh produce accessibility, optimizing nutritional value, eliminating the use of ‘forever chemicals’, reducing transportation costs, and fostering global environmental benefits.

Imagine Doris, who is …


The Use Of Regularization To Detect Racial Inequities In Pay Equity Studies: An Empirical Study And Reflections On Regulation Methods, Christopher M. Peña Nov 2023

The Use Of Regularization To Detect Racial Inequities In Pay Equity Studies: An Empirical Study And Reflections On Regulation Methods, Christopher M. Peña

Electronic Theses and Dissertations

Since the late 1970s, multiple linear regression has been the preferred method for identifying discrimination in pay. An empirical study on this topic was conducted using quantitative critical methods. A literature review first examined conflicting views on using multiple linear regression in pay equity studies. The review found that multiple linear regression is used so prevalently in pay equity studies because the courts and practitioners have widely accepted it and because of its simplicity and ability to parse multiple sources of variance simultaneously. Commentaries in the literature cautioned about errors in model specification, the use of tainted variables, and the …


Addressing The Impact Of Time-Dependent Social Groupings On Animal Survival And Recapture Rates In Mark-Recapture Studies, Alexandru M. Draghici Jun 2023

Addressing The Impact Of Time-Dependent Social Groupings On Animal Survival And Recapture Rates In Mark-Recapture Studies, Alexandru M. Draghici

Electronic Thesis and Dissertation Repository

Mark-recapture (MR) models typically assume that individuals under study have independent survival and recapture outcomes. One such model of interest is known as the Cormack-Jolly-Seber (CJS) model. In this dissertation, we conduct three major research projects focused on studying the impact of violating the independence assumption in MR models along with presenting extensions which relax the independence assumption. In the first project, we conduct a simulation study to address the impact of failing to account for pair-bonded animals having correlated recapture and survival fates on the CJS model. We examined the impact of correlation on the likelihood ratio test (LRT), …


A New Generalized Gamma-Weibull Distribution And Its Applications, Nihimat Iyebuhola Aleshinloye, Samuel Adewale Aderoju, Alfred Adewole Abiodun, Bako Lukmon Taiwo Apr 2023

A New Generalized Gamma-Weibull Distribution And Its Applications, Nihimat Iyebuhola Aleshinloye, Samuel Adewale Aderoju, Alfred Adewole Abiodun, Bako Lukmon Taiwo

Al-Bahir Journal for Engineering and Pure Sciences

In this paper, a New Generalized Gamma-Weibull (NGGW) distribution is developed by compounding Weibull and generalized gamma distribution. Some mathematical properties such as moments, Rényi entropy and order statistics are derived and discussed. The maximum likelihood estimation (MLE) method is used to estimate the model parameters. The proposed model is applied to two real-life datasets to illustrate its performance and flexibility as compared to some other competing distributions. The results obtained show that the new distribution fits each of the data better than the other competing distributions.


Analyzing Relationships With Machine Learning, Oscar Ko Feb 2023

Analyzing Relationships With Machine Learning, Oscar Ko

Dissertations, Theses, and Capstone Projects

Procedurally, this project aims to take a dataset, analyze it, and offer insights to the audience in an easy-to-digest format. Conceptually, this project will seek to explore questions like: “Do couples that meet through online dating or dating apps have higher or lower quality relationships?”, “Can any features in this dataset help predict how a subject would rate their relationship quality?”, and “What other insights can I derive from using machine learning for exploratory analysis?” The intended audience for this project is anyone interested in romantic relationships or machine learning.

The dataset is from a Stanford University survey, “How Couples …


Carnivore And Ungulate Occurrence In A Fire-Prone Region, Sara J. Moriarty-Graves Jan 2023

Carnivore And Ungulate Occurrence In A Fire-Prone Region, Sara J. Moriarty-Graves

Cal Poly Humboldt theses and projects

Increasing fire size and severity in the western United States causes changes to ecosystems, species’ habitat use, and interspecific interactions. Wide-ranging carnivore and ungulate mammalian species and their interactions may be influenced by an increase in fire activity in northern California. Depending on the fire characteristics, ungulates may benefit from burned habitat due to an increase in forage availability, while carnivore species may be differentially impacted, but ultimately driven by bottom-up processes from a shift in prey availability. I used a three-step approach to estimate the single-species occupancy of four large mammal species: mountain lion (Puma concolor), coyote …


Network Intrusion Detection Using Deep Reinforcement Learning, Hamed T. Sanusi Jan 2023

Network Intrusion Detection Using Deep Reinforcement Learning, Hamed T. Sanusi

Electronic Theses and Dissertations

This thesis delves into cybersecurity by applying Deep Reinforcement(DRL) Learning in network intrusion detection. One advantage of DRL is the ability to adapt to changing network conditions and evolving attack methods, making it a promising solution for addressing the challenges involved in intrusion detection. The thesis will also discuss the obstacles and benefits of using Classification methods for network intrusion detection and the need for high-quality training data. To train and test our proposed method, the NSL-KDD dataset was used and then adjusted by converting it from a multi-classification to a binary classification, achieved by joining all attacks into one. …


Predicting Insulin Pump Therapy Settings, Riccardo L. Ferraro, David Grijalva, Alex Trahan Sep 2022

Predicting Insulin Pump Therapy Settings, Riccardo L. Ferraro, David Grijalva, Alex Trahan

SMU Data Science Review

Millions of people live with diabetes worldwide [7]. To mitigate some of the many symptoms associated with diabetes, an estimated 350,000 people in the United States rely on insulin pumps [17]. For many of these people, how effectively their insulin pump performs is the difference between sleeping through the night and a life threatening emergency treatment at a hospital. Three programmed insulin pump therapy settings governing effective insulin pump function are: Basal Rate (BR), Insulin Sensitivity Factor (ISF), and Carbohydrate Ratio (ICR). For many people using insulin pumps, these therapy settings are often not correct, given their physiological needs. While …


Exploration In Mental Performance For Division 1 Sec College Football Student Athletes, Alex Burgdorf Aug 2022

Exploration In Mental Performance For Division 1 Sec College Football Student Athletes, Alex Burgdorf

Department of Occupational Therapy Entry-Level Capstone Projects

The stigma surrounding mental health in sports has made intervention difficult. “There is a need for various actors to provide more effective strategies to overcome the stigma that surrounds mental illness, increase mental health literacy in the athlete/coach community, and address athlete-specific barriers to seeking treatment for mental illness” (Castadelli-Maia et.al 2019). The athletes in the football program at the University of Tennessee face more pressure today than ever in history. They have their class schedule, practice and training every day, and meetings with their position coaches. Now, with the introduction of name, image, and likeness (NIL) allowing players to …


A Bayesian Programming Approach To Car-Following Model Calibration And Validation Using Limited Data, Franklin Abodo Jun 2022

A Bayesian Programming Approach To Car-Following Model Calibration And Validation Using Limited Data, Franklin Abodo

FIU Electronic Theses and Dissertations

Traffic simulation software is used by transportation researchers and engineers to design and evaluate changes to roadway networks. Underlying these simulators are mathematical models of microscopic driver behavior from which macroscopic measures of flow and congestion can be recovered. Many models are intended to apply to only a subset of possible traffic scenarios and roadway configurations, while others do not have any explicit constraint on their applicability. Work zones on highways are one scenario for which no model invented to date has been shown to accurately reproduce realistic driving behavior. This makes it difficult to optimize for safety and other …


Realtime Event Detection In Sports Sensor Data With Machine Learning, Mallory Cashman Jan 2022

Realtime Event Detection In Sports Sensor Data With Machine Learning, Mallory Cashman

Honors Theses and Capstones

Machine learning models can be trained to classify time series based sports motion data, without reliance on assumptions about the capabilities of the users or sensors. This can be applied to predict the count of occurrences of an event in a time period. The experiment for this research uses lacrosse data, collected in partnership with SPAITR - a UNH undergraduate startup developing motion tracking devices for lacrosse. Decision Tree and Support Vector Machine (SVM) models are trained and perform with high success rates. These models improve upon previous work in human motion event detection and can be used a reference …


Role Of Inhibition And Spiking Variability In Ortho- And Retronasal Olfactory Processing, Michelle F. Craft Jan 2022

Role Of Inhibition And Spiking Variability In Ortho- And Retronasal Olfactory Processing, Michelle F. Craft

Theses and Dissertations

Odor perception is the impetus for important animal behaviors, most pertinently for feeding, but also for mating and communication. There are two predominate modes of odor processing: odors pass through the front of nose (ortho) while inhaling and sniffing, or through the rear (retro) during exhalation and while eating and drinking. Despite the importance of olfaction for an animal’s well-being and specifically that ortho and retro naturally occur, it is unknown whether the modality (ortho versus retro) is transmitted to cortical brain regions, which could significantly instruct how odors are processed. Prior imaging studies show different …


Approximate Likelihood Based Estimations For Joint Models With Intractable Likelihoods, Karl Stessy M. Bisselou Dec 2021

Approximate Likelihood Based Estimations For Joint Models With Intractable Likelihoods, Karl Stessy M. Bisselou

Theses & Dissertations

This dissertation focuses on the development of approximation approaches for the joint modeling (JM) of repeated measures data and time-to-event data in the presence of analytically or numerically intractable likelihoods. Current likelihood-based inferences for JMs show several limitations including (i) intractability of integrals during marginal likelihood derivations due to the complexity in computations, and (ii) the large number of nuisance parameters (unobserved) posing a problem with convergence. The h-likelihood (HL) and synthetic likelihood (SL) are two computationally efficient estimation approaches that overcome these challenges.

In the presence of extremely high censoring rates, the HL can produce bias parameter estimates. We …


Confidence Interval For The Mean Of A Beta Distribution, Sean Rangel Dec 2021

Confidence Interval For The Mean Of A Beta Distribution, Sean Rangel

Electronic Theses and Dissertations

Statistical inference for the mean of a beta distribution has become increasingly popular in various fields of academic research. In this study, we developed a novel statistical model from likelihood-based techniques to evaluate various confidence interval techniques for the mean of a beta distribution. Simulation studies will be implemented to compare the performance of the confidence intervals. In addition to the development and study involving confidence intervals, we will also apply the confidence intervals to real biological data that was gathered by the Department of Biology at Stephen F. Austin State University and provide recommendations on the best practice.


Application Of Randomness In Finance, Jose Sanchez, Daanial Ahmad, Satyanand Singh May 2021

Application Of Randomness In Finance, Jose Sanchez, Daanial Ahmad, Satyanand Singh

Publications and Research

Brownian Motion which is also considered to be a Wiener process and can be thought of as a random walk. In our project we had briefly discussed the fluctuations of financial indices and related it to Brownian Motion and the modeling of Stock prices.


Assessing The Variations Of Educational Attainment At National And Subnational Levels Using Hierarchical Linear Models, Bingxin Qi Jan 2021

Assessing The Variations Of Educational Attainment At National And Subnational Levels Using Hierarchical Linear Models, Bingxin Qi

Electronic Theses and Dissertations

Education is a human right, and equal access to education is not only crucial for an individual’s well-being, but also essential for eradicating poverty, ensuring long-term prosperity for all, transforming the society, and achieving sustainable development. Measuring education development, especially the variations of educational attainment, in a timely and accurate manner can help educators, practitioners, scientists, and policymakers compare and evaluate various education indicators at both subnational and national levels. This research presents an approach that combines multi-source and multidimensional data including population distribution, human settlement, and education data to assess and explore educational attainment trajectories at both national and …


Neither “Post-War” Nor Post-Pregnancy Paranoia: How America’S War On Drugs Continues To Perpetuate Disparate Incarceration Outcomes For Pregnant, Substance-Involved Offenders, Becca S. Zimmerman Jan 2021

Neither “Post-War” Nor Post-Pregnancy Paranoia: How America’S War On Drugs Continues To Perpetuate Disparate Incarceration Outcomes For Pregnant, Substance-Involved Offenders, Becca S. Zimmerman

Pitzer Senior Theses

This thesis investigates the unique interactions between pregnancy, substance involvement, and race as they relate to the War on Drugs and the hyper-incarceration of women. Using ordinary least square regression analyses and data from the Bureau of Justice Statistics’ 2016 Survey of Prison Inmates, I examine if (and how) pregnancy status, drug use, race, and their interactions influence two length of incarceration outcomes: sentence length and amount of time spent in jail between arrest and imprisonment. The results collectively indicate that pregnancy decreases length of incarceration outcomes for those offenders who are not substance-involved but not evenhandedly -- benefitting white …


Applying The Data: Predictive Analytics In Sport, Anthony Teeter, Margo Bergman Nov 2020

Applying The Data: Predictive Analytics In Sport, Anthony Teeter, Margo Bergman

Access*: Interdisciplinary Journal of Student Research and Scholarship

The history of wagering predictions and their impact on wide reaching disciplines such as statistics and economics dates to at least the 1700’s, if not before. Predicting the outcomes of sports is a multibillion-dollar business that capitalizes on these tools but is in constant development with the addition of big data analytics methods. Sportsline.com, a popular website for fantasy sports leagues, provides odds predictions in multiple sports, produces proprietary computer models of both winning and losing teams, and provides specific point estimates. To test likely candidates for inclusion in these prediction algorithms, the authors developed a computer model, and test …


Statistical Models And Analysis Of Univariate And Multivariate Degradation Data, Lochana Palayangoda May 2020

Statistical Models And Analysis Of Univariate And Multivariate Degradation Data, Lochana Palayangoda

Statistical Science Theses and Dissertations

For degradation data in reliability analysis, estimation of the first-passage time (FPT) distribution to a threshold provides valuable information on reliability characteristics. Recently, Balakrishnan and Qin (2019; Applied Stochastic Models in Business and Industry, 35:571-590) studied a nonparametric method to approximate the FPT distribution of such degradation processes if the underlying process type is unknown. In this thesis, we propose improved techniques based on saddlepoint approximation, which enhance upon their suggested methods. Numerical examples and Monte Carlo simulation studies are used to illustrate the advantages of the proposed techniques. Limitations of the improved techniques are discussed and some possible solutions …


Analyzing Competitive Balance In Professional Sport, Kevin Alwell May 2020

Analyzing Competitive Balance In Professional Sport, Kevin Alwell

Honors Scholar Theses

In this paper we review several measures to statistically analyze competitive balance and report which leagues have a wider variance of performance amongst its competitors. Each league seeks to maintain high levels of parity, making matches and overall season more unpredictable and appealing to the general audience. Here we quantify competitive advantage across major sports leagues in numbers using several statistical methods in order for leagues to optimize their revenue.


Evaluating An Ordinal Output Using Data Modeling, Algorithmic Modeling, And Numerical Analysis, Martin Keagan Wynne Brown Jan 2020

Evaluating An Ordinal Output Using Data Modeling, Algorithmic Modeling, And Numerical Analysis, Martin Keagan Wynne Brown

Murray State Theses and Dissertations

Data and algorithmic modeling are two different approaches used in predictive analytics. The models discussed from these two approaches include the proportional odds logit model (POLR), the vector generalized linear model (VGLM), the classification and regression tree model (CART), and the random forests model (RF). Patterns in the data were analyzed using trigonometric polynomial approximations and Fast Fourier Transforms. Predictive modeling is used frequently in statistics and data science to find the relationship between the explanatory (input) variables and a response (output) variable. Both approaches prove advantageous in different cases depending on the data set. In our case, the data …


Aggregate Loss Model With Poisson-Tweedie Loss Frequency, Si Chen Jan 2020

Aggregate Loss Model With Poisson-Tweedie Loss Frequency, Si Chen

Theses and Dissertations (Comprehensive)

The aggregate loss model has applications in various areas such as financial risk management and actuarial science. The aggregate loss is the summation of all random losses occurred in a period, and it is governed by both the loss severity and the loss frequency. While the impact of the loss severity on aggregate loss is well studied, less focus is paid on the influence of loss frequency on aggregate loss, which motivates our study. In this thesis, we enrich the aggregate loss framework by introducing the Poisson-Tweedie distribution as a candidate for modelling loss frequency, prove the closedness of Poisson-Tweedie …


Identifying Customer Churn In After-Market Operations Using Machine Learning Algorithms, Vitaly Briker, Richard Farrow, William Trevino, Brent Allen Dec 2019

Identifying Customer Churn In After-Market Operations Using Machine Learning Algorithms, Vitaly Briker, Richard Farrow, William Trevino, Brent Allen

SMU Data Science Review

This paper presents a comparative study on machine learning methods as they are applied to product associations, future purchase predictions, and predictions of customer churn in aftermarket operations. Association rules are used help to identify patterns across products and find correlations in customer purchase behaviour. Studying customer behaviour as it pertains to Recency, Frequency, and Monetary Value (RFM) helps inform customer segmentation and identifies customers with propensity to churn. Lastly, Flowserve’s customer purchase history enables the establishment of churn thresholds for each customer group and assists in constructing a model to predict future churners. The aim of this model is …


Ordinal Hyperplane Loss, Bob Vanderheyden Dec 2019

Ordinal Hyperplane Loss, Bob Vanderheyden

Doctor of Data Science and Analytics Dissertations

This research presents the development of a new framework for analyzing ordered class data, commonly called “ordinal class” data. The focus of the work is the development of classifiers (predictive models) that predict classes from available data. Ratings scales, medical classification scales, socio-economic scales, meaningful groupings of continuous data, facial emotional intensity and facial age estimation are examples of ordinal data for which data scientists may be asked to develop predictive classifiers. It is possible to treat ordinal classification like any other classification problem that has more than two classes. Specifying a model with this strategy does not fully utilize …


Machine Learning In Support Of Electric Distribution Asset Failure Prediction, Robert D. Flamenbaum, Thomas Pompo, Christopher Havenstein, Jade Thiemsuwan Aug 2019

Machine Learning In Support Of Electric Distribution Asset Failure Prediction, Robert D. Flamenbaum, Thomas Pompo, Christopher Havenstein, Jade Thiemsuwan

SMU Data Science Review

In this paper, we present novel approaches to predicting as- set failure in the electric distribution system. Failures in overhead power lines and their associated equipment in particular, pose significant finan- cial and environmental threats to electric utilities. Electric device failure furthermore poses a burden on customers and can pose serious risk to life and livelihood. Working with asset data acquired from an electric utility in Southern California, and incorporating environmental and geospatial data from around the region, we applied a Random Forest methodology to predict which overhead distribution lines are most vulnerable to fail- ure. Our results provide evidence …


Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan Mar 2019

Unified Methods For Feature Selection In Large-Scale Genomic Studies With Censored Survival Outcomes, Lauren Spirko-Burns, Karthik Devarajan

COBRA Preprint Series

One of the major goals in large-scale genomic studies is to identify genes with a prognostic impact on time-to-event outcomes which provide insight into the disease's process. With rapid developments in high-throughput genomic technologies in the past two decades, the scientific community is able to monitor the expression levels of tens of thousands of genes and proteins resulting in enormous data sets where the number of genomic features is far greater than the number of subjects. Methods based on univariate Cox regression are often used to select genomic features related to survival outcome; however, the Cox model assumes proportional hazards …


Global Warming Statistical Analysis, Jared Skinner Jan 2019

Global Warming Statistical Analysis, Jared Skinner

Williams Honors College, Honors Research Projects

This paper will investigate global warming and its effects on natural disasters. I will review the historic movements of climate change and activism, as well as the current discussions surrounding global warming. Secondly, I will examine various datasets, paying attention to the severity and frequency of specific natural disasters. I will then touch briefly on the topic of catastrophe modeling as it relates to the increased risk and losses associated with the discussed natural disasters and how those put the problem of global warming in a framework which financial and government institutions can grasp. I will also be analyzing economic …


Rfviz: An Interactive Visualization Package For Random Forests In R, Christopher Beckett Dec 2018

Rfviz: An Interactive Visualization Package For Random Forests In R, Christopher Beckett

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Random forests are very popular tools for predictive analysis and data science. They work for both classification (where there is a categorical response variable) and regression (where the response is continuous). Random forests provide proximities, and both local and global measures of variable importance. However, these quantities require special tools to be effectively used to interpret the forest. Rfviz is a sophisticated interactive visualization package and toolkit in R, specially designed for interpreting the results of a random forest in a user-friendly way. Rfviz uses a recently developed R package (loon) from the Comprehensive R Archive Network (CRAN) to create …


Season-Ahead Forecasting Of Water Storage And Irrigation Requirements – An Application To The Southwest Monsoon In India, Arun Ravindranath, Naresh Devineni, Upmanu Lall, Paulina Concha Larrauri Oct 2018

Season-Ahead Forecasting Of Water Storage And Irrigation Requirements – An Application To The Southwest Monsoon In India, Arun Ravindranath, Naresh Devineni, Upmanu Lall, Paulina Concha Larrauri

Publications and Research

Water risk management is a ubiquitous challenge faced by stakeholders in the water or agricultural sector. We present a methodological framework for forecasting water storage requirements and present an application of this methodology to risk assessment in India. The application focused on forecasting crop water stress for potatoes grown during the monsoon season in the Satara district of Maharashtra. Pre-season large-scale climate predictors used to forecast water stress were selected based on an exhaustive search method that evaluates for highest ranked probability skill score and lowest root-mean-squared error in a leave-one-out cross-validation mode. Adaptive forecasts were made in the years …


Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels Aug 2018

Yelp’S Review Filtering Algorithm, Yao Yao, Ivelin Angelov, Jack Rasmus-Vorrath, Mooyoung Lee, Daniel W. Engels

SMU Data Science Review

In this paper, we present an analysis of features influencing Yelp's proprietary review filtering algorithm. Classifying or misclassifying reviews as recommended or non-recommended affects average ratings, consumer decisions, and ultimately, business revenue. Our analysis involves systematically sampling and scraping Yelp restaurant reviews. Features are extracted from review metadata and engineered from metrics and scores generated using text classifiers and sentiment analysis. The coefficients of a multivariate logistic regression model were interpreted as quantifications of the relative importance of features in classifying reviews as recommended or non-recommended. The model classified review recommendations with an accuracy of 78%. We found that reviews …