Open Access. Powered by Scholars. Published by Universities.®

Statistical Models Commons

Open Access. Powered by Scholars. Published by Universities.®

Data Science

Institution
Keyword
Publication Year
Publication
Publication Type
File Type

Articles 61 - 90 of 90

Full-Text Articles in Statistical Models

Realtime Event Detection In Sports Sensor Data With Machine Learning, Mallory Cashman Jan 2022

Realtime Event Detection In Sports Sensor Data With Machine Learning, Mallory Cashman

Honors Theses and Capstones

Machine learning models can be trained to classify time series based sports motion data, without reliance on assumptions about the capabilities of the users or sensors. This can be applied to predict the count of occurrences of an event in a time period. The experiment for this research uses lacrosse data, collected in partnership with SPAITR - a UNH undergraduate startup developing motion tracking devices for lacrosse. Decision Tree and Support Vector Machine (SVM) models are trained and perform with high success rates. These models improve upon previous work in human motion event detection and can be used a reference …


Identification And Characterization Of Forest Fire Risk Zones Leveraging Machine Learning Methods, Joshua Balson, Matt Chinchilla, Cam Lu, Jeff Washburn, Nibhrat Lohia Dec 2021

Identification And Characterization Of Forest Fire Risk Zones Leveraging Machine Learning Methods, Joshua Balson, Matt Chinchilla, Cam Lu, Jeff Washburn, Nibhrat Lohia

SMU Data Science Review

Across the United States, record numbers of wildfires are observed costing billions of dollars in property damage, polluting the environment, and putting lives at risk. The ability of emergency management professionals, city planners, and private entities such as insurance companies to determine if an area is at higher risk of a fire breaking out has never been greater. This paper proposes a novel methodology for identifying and characterizing zones with increased risks of forest fires. Methods involving machine learning techniques use the widely available and recorded data, thus making it possible to implement the tool quickly.


Comparing Machine Learning Techniques With State-Of-The-Art Parametric Prediction Models For Predicting Soybean Traits, Susweta Ray Dec 2021

Comparing Machine Learning Techniques With State-Of-The-Art Parametric Prediction Models For Predicting Soybean Traits, Susweta Ray

Department of Statistics: Dissertations, Theses, and Student Work

Soybean is a significant source of protein and oil, and also widely used as animal feed. Thus, developing lines that are superior in terms of yield, protein and oil content is important to feed the ever-growing population. As opposed to the high-cost phenotyping, genotyping is both cost and time efficient for breeders while evaluating new lines in different environments (location-year combinations) can be costly. Several Genomic prediction (GP) methods have been developed to use the marker and environment data effectively to predict the yield or other relevant phenotypic traits of crops. Our study compares a conventional GP method (GBLUP), a …


Interpolating Missing Data And Comparing Performance Of Common Interpolation Techniques From A 30-Year Water Quality Dataset, Wako Bungula, Danelle M. Larson Dr., Killian Davis, Richard Erickson Dr., Amber Lee, Casey Mckean, Frederick Miller, Alaina Stockdill, Enrika Hlavacek Nov 2021

Interpolating Missing Data And Comparing Performance Of Common Interpolation Techniques From A 30-Year Water Quality Dataset, Wako Bungula, Danelle M. Larson Dr., Killian Davis, Richard Erickson Dr., Amber Lee, Casey Mckean, Frederick Miller, Alaina Stockdill, Enrika Hlavacek

Annual Symposium on Biomathematics and Ecology Education and Research

No abstract provided.


Monitoring Mammals At Multiple Scales: Case Studies From Carnivore Communities, Kadambari Devarajan Oct 2021

Monitoring Mammals At Multiple Scales: Case Studies From Carnivore Communities, Kadambari Devarajan

Doctoral Dissertations

Carnivores are distributed widely and threatened by habitat loss, poaching, climate change, and disease. They are considered integral to ecosystem function through their direct and indirect interactions with species at different trophic levels. Given the importance of carnivores, it is of high conservation priority to understand the processes driving carnivore assemblages in different systems. It is thus essential to determine the abiotic and biotic drivers of carnivore community composition at different spatial scales and address the following questions: (i) What factors influence carnivore community composition and diversity? (ii) How do the factors influencing carnivore communities vary across spatial and temporal …


Science Is For Everybody: A Resource For Understanding Glaciers, Climate, And Modeling, Emma Watson Oct 2021

Science Is For Everybody: A Resource For Understanding Glaciers, Climate, And Modeling, Emma Watson

Independent Study Project (ISP) Collection

Climate change threatens the existence of glaciers worldwide. In order to properly interact with these changing systems, we must first understand them. Glacial models provide an excellent way to do this; however, the language and mathematical concepts used in their creation is generally inaccessible to a common audience. This project presents an online resource for a general audience to interact with climate science, glaciology, and glacial modeling. Long term goals for the project include the incorporation of a glacial model of Drangajökull, Vestfirðir, NW Iceland. As such, focus for the project includes a literature review of glaciers, Drangajökull in particular, …


Determining Malignancy: Can Mammogram Results Help Predict The Diagnosis Of Breast Tumors?, Taylor Behrens Aug 2021

Determining Malignancy: Can Mammogram Results Help Predict The Diagnosis Of Breast Tumors?, Taylor Behrens

Symposium of Student Scholars

Even with advancements in treatment and preventative care, breast cancer remains an epidemic claiming more than 40,000 American male and female lives each year. The mammogram dataset that I am analyzing was initially complied in the early 1990s by a team from the University of Wisconsin - Madison. Past research diagnoses breast cancer from fine-needle aspirates. My research focuses on predicting whether we can determine breast cancer diagnoses without the use of invasive procedures and, in particular, whether we can predict breast cancer based on mammogram data. Do measures of gray-scale texture, radius, concavity, perimeter, compactness, area, and smoothness of …


Empirical Fitting Of Periodically Repeating Environmental Data, Pavel Bělík, Andrew Hotchkiss, Brandon Perez, John Zobitz Aug 2021

Empirical Fitting Of Periodically Repeating Environmental Data, Pavel Bělík, Andrew Hotchkiss, Brandon Perez, John Zobitz

Spora: A Journal of Biomathematics

We extend and generalize an approach to conduct fitting models of periodically repeating data. Our method first detrends the data from a baseline function and then fits the data to a periodic (trigonometric, polynomial, or piecewise linear) function. The polynomial and piecewise linear functions are developed from assumptions of continuity and differentiability across each time period. We apply this approach to different datasets in the environmental sciences in addition to a synthetic dataset. Overall the polynomial and piecewise linear approaches developed here performed as good (or better) compared to the trigonometric approach when evaluated using statistical measures (R2 …


Ensemble Data Fitting For Bathymetric Models Informed By Nominal Data, Samantha Zambo Aug 2021

Ensemble Data Fitting For Bathymetric Models Informed By Nominal Data, Samantha Zambo

Dissertations

Due to the difficulty and expense of collecting bathymetric data, modeling is the primary tool to produce detailed maps of the ocean floor. Current modeling practices typically utilize only one interpolator; the industry standard is splines-in-tension.

In this dissertation we introduce a new nominal-informed ensemble interpolator designed to improve modeling accuracy in regions of sparse data. The method is guided by a priori domain knowledge provided by artificially intelligent classifiers. We recast such geomorphological classifications, such as ‘seamount’ or ‘ridge’, as nominal data which we utilize as foundational shapes in an expanded ordinary least squares regression-based algorithm. To our knowledge …


Model-Free Descriptive Modeling For Multivariate Categorical Data With An Ordinal Dependent Variable, Li Wang Jul 2021

Model-Free Descriptive Modeling For Multivariate Categorical Data With An Ordinal Dependent Variable, Li Wang

Doctoral Dissertations

In the process of statistical modeling, the descriptive modeling plays an essential role in accelerating the formulation of plausible hypotheses in the subsequent explanatory modeling and facilitating the selection of potential variables in the subsequent predictive modeling. Especially, for multivariate categorical data analysis, it is desirable to use the descriptive modeling methods for uncovering and summarizing the potential association structure among multiple categorical variables in a compact manner. However, many classical methods in this case either rely on strong assumptions for parametric models or become infeasible when the data dimension is higher. To this end, we propose a model-free method …


Machine Learning Based Restaurant Sales Forecasting, Austin B. Schmidt May 2021

Machine Learning Based Restaurant Sales Forecasting, Austin B. Schmidt

University of New Orleans Theses and Dissertations

To encourage proper employee scheduling for managing crew load, restaurants have a need for accurate sales forecasting. We predict partitions of sales days, so each day is broken up into three sales periods: 10:00 AM-1:59 PM, 2:00 PM-5:59 PM, and 6:00 PM-10:00 PM. This study focuses on the middle timeslot, where sales forecasts should extend for one week. We gather three years of sales between 2016-2019 from a local restaurant, to generate a new dataset for researching sales forecasting methods.

Outlined are methodologies used when going from raw data to a workable dataset. We test many machine learning models on …


Characterizing The Northern Hemisphere Circumpolar Vortex Through Space And Time, Nazla Bushra May 2021

Characterizing The Northern Hemisphere Circumpolar Vortex Through Space And Time, Nazla Bushra

LSU Doctoral Dissertations

This hemispheric-scale, steering atmospheric circulation represented by the circumpolar vortices (CPVs) are the middle- and upper-tropospheric wind belts circumnavigating the poles. Variability in the CPV area, shape, and position are important topics in geoenvironmental sciences because of the many links to environmental features. However, a means of characterizing the CPV has remained elusive. The goal of this research is to (i) identify the Northern Hemisphere CPV (NHCPV) and its morphometric characteristics, (ii) understand the daily characteristics of NHCPV area and circularity over time, (iii) identify and analyze spatiotemporal variability in the NHCPV’s centroid, and (iv) analyze how CPV features relate …


Application Of Randomness In Finance, Jose Sanchez, Daanial Ahmad, Satyanand Singh May 2021

Application Of Randomness In Finance, Jose Sanchez, Daanial Ahmad, Satyanand Singh

Publications and Research

Brownian Motion which is also considered to be a Wiener process and can be thought of as a random walk. In our project we had briefly discussed the fluctuations of financial indices and related it to Brownian Motion and the modeling of Stock prices.


Statistical Analysis Of 2017-18 Premier League Match Statistics Using A Regression Analysis In R, Bergen Campbell May 2021

Statistical Analysis Of 2017-18 Premier League Match Statistics Using A Regression Analysis In R, Bergen Campbell

Undergraduate Theses and Capstone Projects

This thesis analyzes the correlation between a team’s statistics and the success of their performances, and develops a predictive model that can be used to forecast final season results for that team. Data from the 2017-2018 Premier League season is to be gathered and broken down within R to highlight what factors and variables are largely contributing to the success or downfall of a team. A multiple linear regression model and stepwise selection process is then used to include any factors that are significant in predicting in match results.

The predictions about the 17-18 season results based on the model …


An Exploratory Analysis Of The Bgsu Learning Commons Student Usage Data, Emily Eskuri Apr 2021

An Exploratory Analysis Of The Bgsu Learning Commons Student Usage Data, Emily Eskuri

Honors Projects

The purpose of this study was to explore past student usage data in individualized tutoring sessions from the Learning Commons from two academic years. The Bowling Green State University (BGSU) Learning Commons is a learning assistance center that offers various services, such as individualized tutoring, math assistance, writing assistance, study hours, and academic coaching. There have been limited research studies into how big data and analytics can have an impact in higher education, especially research utilizing predictive analytics.

This project applied analytics to individualized tutoring data in the Learning Commons to create a better understanding of why those trends happen …


A Bayesian Hierarchical Mixture Model With Continuous-Time Markov Chains To Capture Bumblebee Foraging Behavior, Max Thrush Hukill Jan 2021

A Bayesian Hierarchical Mixture Model With Continuous-Time Markov Chains To Capture Bumblebee Foraging Behavior, Max Thrush Hukill

Honors Projects

The standard statistical methodology for analyzing complex case-control studies in ethology is often limited by approaches that force researchers to model distinct aspects of biological processes in a piecemeal, disjointed fashion. By developing a hierarchical Bayesian model, this work demonstrates that statistical inference in this context can be done using a single coherent framework. To do this, we construct a continuous-time Markov chain (CTMC) to model bumblebee foraging behavior. To connect the experimental design with the CTMC, we employ a mixture model controlled by a logistic regression on the two-factor design matrix. We then show how to infer these model …


Statistical Approaches For Estimation And Comparison Of Brain Functional Connectivity, Jifang Zhao Jan 2021

Statistical Approaches For Estimation And Comparison Of Brain Functional Connectivity, Jifang Zhao

Theses and Dissertations

Drug addiction can lead to many health-related problems and social concerns. Functional connectivity obtained from functional magnetic resonance imaging (fMRI) data promotes a variety of fundamental understandings in such association. Due to its complex correlation structure and large dimensionality, the modeling and analysis of the functional connectivity from neuroimage are challenging. By proposing a spatio-temporal model for multi-subject neuroimage data, we incorporate voxel-level spatio-temporal dependencies of whole-brain measurements to improve the accuracy of statistical inference. To tackle large-scale spatio-temporal neuroimage data, we develop a computationally efficient algorithm to estimate the parameters. Our method is used to identify functional connectivity and …


A Transdisciplinary Analysis Of Just Transition Pathways To 100% Renewable Electricity, Adewale Aremu Adesanya Jan 2021

A Transdisciplinary Analysis Of Just Transition Pathways To 100% Renewable Electricity, Adewale Aremu Adesanya

Dissertations, Master's Theses and Master's Reports

The transition to using clean, affordable, and reliable electrical energy is critical for enhancing human opportunities and capabilities. In the United States, many states and localities are engaging in this transition despite the lack of ambitious federal policy support. This research builds on the theoretical framework of the multilevel perspective (MLP) of sociotechnical transitions as well as the concept of energy justice to investigate potential pathways to 100 percent renewable energy (RE) for electricity provision in the U.S. This research seeks to answer the question: what are the technical, policy, and perceptual pathways, barriers, and opportunities for just transition to …


Modeling Multivariate Hopfield-Transformer Hawkes Process: Application To Sovereign Credit Default Swaps, Mohsen Bahremani Jan 2021

Modeling Multivariate Hopfield-Transformer Hawkes Process: Application To Sovereign Credit Default Swaps, Mohsen Bahremani

Theses and Dissertations (Comprehensive)

Hawkes process was evolved so that the past events contribute to the occurrence time of future events by self-exciting or mutually exciting. However, many real-world data do not follow the Hawkes process's assumptions (i.e., positivity, additivity, and exponential decay) and become more complex to be modeled by the traditional Hawkes processes, so the neural Hawkes process was developed to tackle the challenges. However, Recurrent Neural Networks (RNN) fail to capture long-term dependencies among multiple point processes, and Transformer Hawkes processes only address temporal characteristics of Hawkes processes. In this thesis, we proposed a combination of neural networks and Hawkes processes …


Dimension Reduction Techniques In Regression, Pei Wang Jan 2021

Dimension Reduction Techniques In Regression, Pei Wang

Theses and Dissertations--Statistics

Because of the advances of modern technology, the size of the collected data nowadays is larger and the structure is more complex. To deal with such kinds of data, sufficient dimension reduction (SDR) and reduced rank (RR) regression are two powerful tools. This dissertation focuses on these two tools and it is composed of three projects. In the first project, we introduce a new SDR method through a novel approach of feature filter to recover the central mean subspace exhaustively along with a method to determine the dimension, two variable selection methods, and extensions to multivariate response and large p …


Neither “Post-War” Nor Post-Pregnancy Paranoia: How America’S War On Drugs Continues To Perpetuate Disparate Incarceration Outcomes For Pregnant, Substance-Involved Offenders, Becca S. Zimmerman Jan 2021

Neither “Post-War” Nor Post-Pregnancy Paranoia: How America’S War On Drugs Continues To Perpetuate Disparate Incarceration Outcomes For Pregnant, Substance-Involved Offenders, Becca S. Zimmerman

Pitzer Senior Theses

This thesis investigates the unique interactions between pregnancy, substance involvement, and race as they relate to the War on Drugs and the hyper-incarceration of women. Using ordinary least square regression analyses and data from the Bureau of Justice Statistics’ 2016 Survey of Prison Inmates, I examine if (and how) pregnancy status, drug use, race, and their interactions influence two length of incarceration outcomes: sentence length and amount of time spent in jail between arrest and imprisonment. The results collectively indicate that pregnancy decreases length of incarceration outcomes for those offenders who are not substance-involved but not evenhandedly -- benefitting white …


Incorporating Shear Resistance Into Debris Flow Triggering Model Statistics, Noah J. Lyman Dec 2020

Incorporating Shear Resistance Into Debris Flow Triggering Model Statistics, Noah J. Lyman

Master's Theses

Several regions of the Western United States utilize statistical binary classification models to predict and manage debris flow initiation probability after wildfires. As the occurrence of wildfires and large intensity rainfall events increase, so has the frequency in which development occurs in the steep and mountainous terrain where these events arise. This resulting intersection brings with it an increasing need to derive improved results from existing models, or develop new models, to reduce the economic and human impacts that debris flows may bring. Any development or change to these models could also theoretically increase the ease of collection, processing, and …


Stochastic Modeling Of Ovarian Follicle Growth In Adult Female Rats, Zhaozhi Li Nov 2020

Stochastic Modeling Of Ovarian Follicle Growth In Adult Female Rats, Zhaozhi Li

Annual Symposium on Biomathematics and Ecology Education and Research

No abstract provided.


Applying The Data: Predictive Analytics In Sport, Anthony Teeter, Margo Bergman Nov 2020

Applying The Data: Predictive Analytics In Sport, Anthony Teeter, Margo Bergman

Access*: Interdisciplinary Journal of Student Research and Scholarship

The history of wagering predictions and their impact on wide reaching disciplines such as statistics and economics dates to at least the 1700’s, if not before. Predicting the outcomes of sports is a multibillion-dollar business that capitalizes on these tools but is in constant development with the addition of big data analytics methods. Sportsline.com, a popular website for fantasy sports leagues, provides odds predictions in multiple sports, produces proprietary computer models of both winning and losing teams, and provides specific point estimates. To test likely candidates for inclusion in these prediction algorithms, the authors developed a computer model, and test …


Machine Learning Approaches For Improving Prediction Performance Of Structure-Activity Relationship Models, Gabriel Idakwo Aug 2020

Machine Learning Approaches For Improving Prediction Performance Of Structure-Activity Relationship Models, Gabriel Idakwo

Dissertations

In silico bioactivity prediction studies are designed to complement in vivo and in vitro efforts to assess the activity and properties of small molecules. In silico methods such as Quantitative Structure-Activity/Property Relationship (QSAR) are used to correlate the structure of a molecule to its biological property in drug design and toxicological studies. In this body of work, I started with two in-depth reviews into the application of machine learning based approaches and feature reduction methods to QSAR, and then investigated solutions to three common challenges faced in machine learning based QSAR studies.

First, to improve the prediction accuracy of learning …


Edge-Cloud Iot Data Analytics: Intelligence At The Edge With Deep Learning, Ananda Mohon M. Ghosh May 2020

Edge-Cloud Iot Data Analytics: Intelligence At The Edge With Deep Learning, Ananda Mohon M. Ghosh

Electronic Thesis and Dissertation Repository

Rapid growth in numbers of connected devices, including sensors, mobile, wearable, and other Internet of Things (IoT) devices, is creating an explosion of data that are moving across the network. To carry out machine learning (ML), IoT data are typically transferred to the cloud or another centralized system for storage and processing; however, this causes latencies and increases network traffic. Edge computing has the potential to remedy those issues by moving computation closer to the network edge and data sources. On the other hand, edge computing is limited in terms of computational power and thus is not well suited for …


How Machine Learning And Probability Concepts Can Improve Nba Player Evaluation, Harrison Miller Jan 2020

How Machine Learning And Probability Concepts Can Improve Nba Player Evaluation, Harrison Miller

CMC Senior Theses

In this paper I will be breaking down a scholarly article, written by Sameer K. Deshpande and Shane T. Jensen, that proposed a new method to evaluate NBA players. The NBA is the highest level professional basketball league in America and stands for the National Basketball Association. They proposed to build a model that would result in how NBA players impact their teams chances of winning a game, using machine learning and probability concepts. I preface that by diving into these concepts and their mathematical backgrounds. These concepts include building a linear model using ordinary least squares method, the bias …


Supervised Classification Using Finite Mixture Copula, Sumen Sen, Norou Diawara Aug 2017

Supervised Classification Using Finite Mixture Copula, Sumen Sen, Norou Diawara

Mathematics & Statistics Faculty Publications

Use of copula for statistical classification is recent and gaining popularity. For example, statistical classification using copula has been proposed for automatic character recognition, medical diagnostic and most recently in data mining. Classical discrimination rules assume normality. But in this data age time, this assumption is often questionable. In fact features of data could be a mixture of discrete and continues random variables. In this paper, mixture copula densities are used to model class conditional distributions. Such types of densities are useful when the marginal densities of the vector of features are not normally distributed and are of a mixed …


Time Series Analysis For Psychological Research: Examining And Forecasting Change, Andrew T. Jebb, Louis Tay, Wei Wang, Qiming Huang Jun 2015

Time Series Analysis For Psychological Research: Examining And Forecasting Change, Andrew T. Jebb, Louis Tay, Wei Wang, Qiming Huang

Publications and Research

Psychological research has increasingly recognized the importance of integrating temporal dynamics into its theories, and innovations in longitudinal designs and analyses have allowed such theories to be formalized and tested. However, psychological researchers may be relatively unequipped to analyze such data, given its many characteristics and the general complexities involved in longitudinal modeling. The current paper introduces time series analysis to psychological research, an analytic domain that has been essential for understanding and predicting the behavior of variables across many diverse fields. First, the characteristics of time series data are discussed. Second, different time series modeling techniques are surveyed that …


Gulf-Wide Decreases In The Size Of Large Coastal Sharks Documented By Generations Of Fishermen, Sean P. Powers, F. Joel Frodrie, Steven B. Scyphers, J. Marcus Drymon, Robert L. Shipp, Gregory W. Stunz Jan 2013

Gulf-Wide Decreases In The Size Of Large Coastal Sharks Documented By Generations Of Fishermen, Sean P. Powers, F. Joel Frodrie, Steven B. Scyphers, J. Marcus Drymon, Robert L. Shipp, Gregory W. Stunz

University Faculty and Staff Publications

Large sharks are top predators in most coastal and marine ecosystems throughout the world, and evidence of their reduced prominence in marine ecosystems has been a serious concern for fisheries and ecosystem management. Unfortunately, quantitative data to document the extent, timing, and consequences of changes in shark populations are scarce, thwarting examination of long-term (decadal, century) trends, and reconstructions based on incomplete data sets have been the subject of debate. Absence of quantitative descriptors of past ecological conditions is a generic problem facing many fields of science but is particularly troublesome for fisheries scientists who must develop specific targets for …