Open Access. Powered by Scholars. Published by Universities.®

Statistical Models Commons

Open Access. Powered by Scholars. Published by Universities.®

Theory and Algorithms

Institution
Keyword
Publication Year
Publication
Publication Type
File Type

Articles 1 - 30 of 33

Full-Text Articles in Statistical Models

Reducing Food Scarcity: The Benefits Of Urban Farming, S.A. Claudell, Emilio Mejia Dec 2023

Reducing Food Scarcity: The Benefits Of Urban Farming, S.A. Claudell, Emilio Mejia

Journal of Nonprofit Innovation

Urban farming can enhance the lives of communities and help reduce food scarcity. This paper presents a conceptual prototype of an efficient urban farming community that can be scaled for a single apartment building or an entire community across all global geoeconomics regions, including densely populated cities and rural, developing towns and communities. When deployed in coordination with smart crop choices, local farm support, and efficient transportation then the result isn’t just sustainability, but also increasing fresh produce accessibility, optimizing nutritional value, eliminating the use of ‘forever chemicals’, reducing transportation costs, and fostering global environmental benefits.

Imagine Doris, who is …


Self-Learning Algorithms For Intrusion Detection And Prevention Systems (Idps), Juan E. Nunez, Roger W. Tchegui Donfack, Rohit Rohit, Hayley Horn Mar 2023

Self-Learning Algorithms For Intrusion Detection And Prevention Systems (Idps), Juan E. Nunez, Roger W. Tchegui Donfack, Rohit Rohit, Hayley Horn

SMU Data Science Review

Today, there is an increased risk to data privacy and information security due to cyberattacks that compromise data reliability and accessibility. New machine learning models are needed to detect and prevent these cyberattacks. One application of these models is cybersecurity threat detection and prevention systems that can create a baseline of a network's traffic patterns to detect anomalies without needing pre-labeled data; thus, enabling the identification of abnormal network events as threats. This research explored algorithms that can help automate anomaly detection on an enterprise network using Canadian Institute for Cybersecurity data. This study demonstrates that Neural Networks with Bayesian …


Application Of Probabilistic Ranking Systems On Women’S Junior Division Beach Volleyball, Cameron Stewart, Michael Mazel, Bivin Sadler Sep 2022

Application Of Probabilistic Ranking Systems On Women’S Junior Division Beach Volleyball, Cameron Stewart, Michael Mazel, Bivin Sadler

SMU Data Science Review

Women’s beach volleyball is one of the fastest growing collegiate sports today. The increase in popularity has come with an increase in valuable scholarship opportunities across the country. With thousands of athletes to sort through, college scouts depend on websites that aggregate tournament results and rank players nationally. This project partnered with the company Volleyball Life, who is the current market leader in the ranking space of junior beach volleyball players. Utilizing the tournament information provided by Volleyball Life, this study explored replacements to the current ranking systems, which are designed to aggregate player points from recent tournament placements. Three …


A Simple Algorithm For Generating A New Two Sample Type-Ii Progressive Censoring With Applications, E. M. Shokr, Rashad Mohamed El-Sagheer, Mahmoud Mansour, H. M. Faied, B. S. El-Desouky Jan 2022

A Simple Algorithm For Generating A New Two Sample Type-Ii Progressive Censoring With Applications, E. M. Shokr, Rashad Mohamed El-Sagheer, Mahmoud Mansour, H. M. Faied, B. S. El-Desouky

Basic Science Engineering

In this article, we introduce a simple algorithm to generating a new type-II progressive censoring scheme for two samples. It is observed that the proposed algorithm can be applied for any continues probability distribution. Moreover, the description model and necessary assumptions are discussed. In addition, the steps of simple generation algorithm along with programming steps are also constructed on real example. The inference of two Weibull Frechet populations are discussed under the proposed algorithm. Both classical and Bayesian inferential approaches of the distribution parameters are discussed. Furthermore, approximate confidence intervals are constructed based on the asymptotic distribution of the maximum …


Ensemble Data Fitting For Bathymetric Models Informed By Nominal Data, Samantha Zambo Aug 2021

Ensemble Data Fitting For Bathymetric Models Informed By Nominal Data, Samantha Zambo

Dissertations

Due to the difficulty and expense of collecting bathymetric data, modeling is the primary tool to produce detailed maps of the ocean floor. Current modeling practices typically utilize only one interpolator; the industry standard is splines-in-tension.

In this dissertation we introduce a new nominal-informed ensemble interpolator designed to improve modeling accuracy in regions of sparse data. The method is guided by a priori domain knowledge provided by artificially intelligent classifiers. We recast such geomorphological classifications, such as ‘seamount’ or ‘ridge’, as nominal data which we utilize as foundational shapes in an expanded ordinary least squares regression-based algorithm. To our knowledge …


Modified Firearm Discharge Residue Analysis Utilizing Advanced Analytical Techniques, Complexing Agents, And Quantum Chemical Calculations, William J. Feeney Jan 2021

Modified Firearm Discharge Residue Analysis Utilizing Advanced Analytical Techniques, Complexing Agents, And Quantum Chemical Calculations, William J. Feeney

Graduate Theses, Dissertations, and Problem Reports

The use of gunshot residue (GSR) or firearm discharge residue (FDR) evidence faces some challenges because of instrumental and analytical limitations and the difficulties in evaluating and communicating evidentiary value. For instance, the categorization of GSR based only on elemental analysis of single, spherical particles is becoming insufficient because newer ammunition formulations produce residues with varying particle morphology and composition. Also, one common criticism about GSR practitioners is that their reports focus on the presence or absence of GSR in an item without providing an assessment of the weight of the evidence. Such reports leave the end-used with unanswered questions, …


Extensions Of Classification Method Based On Quantiles, Yuanhao Lai Jun 2020

Extensions Of Classification Method Based On Quantiles, Yuanhao Lai

Electronic Thesis and Dissertation Repository

This thesis deals with the problem of classification in general, with a particular focus on heavy-tailed or skewed data. The classification problem is first formalized by statistical learning theory and several important classification methods are reviewed, where the distance-based classifiers, including the median-based classifier and the quantile-based classifier (QC), are especially useful for the heavy-tailed or skewed inputs. However, QC is limited by its model capacity and the issue of high-dimensional accumulated errors. Our objective of this study is to investigate more general methods while retaining the merits of QC.

We present four extensions of QC, which appear in chronological …


Predictive Modeling Of Asynchronous Event Sequence Data, Jin Shang May 2020

Predictive Modeling Of Asynchronous Event Sequence Data, Jin Shang

LSU Doctoral Dissertations

Large volumes of temporal event data, such as online check-ins and electronic records of hospital admissions, are becoming increasingly available in a wide variety of applications including healthcare analytics, smart cities, and social network analysis. Those temporal events are often asynchronous, interdependent, and exhibiting self-exciting properties. For example, in the patient's diagnosis events, the elevated risk exists for a patient that has been recently at risk. Machine learning that leverages event sequence data can improve the prediction accuracy of future events and provide valuable services. For example, in e-commerce and network traffic diagnosis, the analysis of user activities can be …


Data-Driven Investment Decisions In P2p Lending: Strategies Of Integrating Credit Scoring And Profit Scoring, Yan Wang Apr 2020

Data-Driven Investment Decisions In P2p Lending: Strategies Of Integrating Credit Scoring And Profit Scoring, Yan Wang

Doctor of Data Science and Analytics Dissertations

In this dissertation, we develop and discuss several loan evaluation methods to guide the investment decisions for peer-to-peer (P2P) lending. In evaluating loans, credit scoring and profit scoring are the two widely utilized approaches. Credit scoring aims at minimizing the risk while profit scoring aims at maximizing the profit. This dissertation addresses the strengths and weaknesses of each scoring method by integrating them in various ways in order to provide the optimal investment suggestions for different investors. Before developing the methods for loan evaluation at the individual level, we applied the state-of-the-art method called the Long Short Term Memory (LSTM) …


Development Of Gaussian Learning Algorithms For Early Detection Of Alzheimer's Disease, Chen Fang Mar 2020

Development Of Gaussian Learning Algorithms For Early Detection Of Alzheimer's Disease, Chen Fang

FIU Electronic Theses and Dissertations

Alzheimer’s disease (AD) is the most common form of dementia affecting 10% of the population over the age of 65 and the growing costs in managing AD are estimated to be $259 billion, according to data reported in the 2017 by the Alzheimer's Association. Moreover, with cognitive decline, daily life of the affected persons and their families are severely impacted. Taking advantage of the diagnosis of AD and its prodromal stage of mild cognitive impairment (MCI), an early treatment may help patients preserve the quality of life and slow the progression of the disease, even though the underlying disease cannot …


Machine Learning In Support Of Electric Distribution Asset Failure Prediction, Robert D. Flamenbaum, Thomas Pompo, Christopher Havenstein, Jade Thiemsuwan Aug 2019

Machine Learning In Support Of Electric Distribution Asset Failure Prediction, Robert D. Flamenbaum, Thomas Pompo, Christopher Havenstein, Jade Thiemsuwan

SMU Data Science Review

In this paper, we present novel approaches to predicting as- set failure in the electric distribution system. Failures in overhead power lines and their associated equipment in particular, pose significant finan- cial and environmental threats to electric utilities. Electric device failure furthermore poses a burden on customers and can pose serious risk to life and livelihood. Working with asset data acquired from an electric utility in Southern California, and incorporating environmental and geospatial data from around the region, we applied a Random Forest methodology to predict which overhead distribution lines are most vulnerable to fail- ure. Our results provide evidence …


Machine Learning Predicts Aperiodic Laboratory Earthquakes, Olha Tanyuk, Daniel Davieau, Charles South, Daniel W. Engels Aug 2019

Machine Learning Predicts Aperiodic Laboratory Earthquakes, Olha Tanyuk, Daniel Davieau, Charles South, Daniel W. Engels

SMU Data Science Review

In this paper we find a pattern of aperiodic seismic signals that precede earthquakes at any time in a laboratory earthquake’s cycle using a small window of time. We use a data set that comes from a classic laboratory experiment having several stick-slip displacements (earthquakes), a type of experiment which has been studied as a simulation of seismologic faults for decades. This data exhibits similar behavior to natural earthquakes, so the same approach may work in predicting the timing of them. Here we show that by applying random forest machine learning technique to the acoustic signal emitted by a laboratory …


Texture-Based Deep Neural Network For Histopathology Cancer Whole Slide Image (Wsi) Classification, Nelson Zange Tsaku Aug 2019

Texture-Based Deep Neural Network For Histopathology Cancer Whole Slide Image (Wsi) Classification, Nelson Zange Tsaku

Master of Science in Computer Science Theses

Automatic histopathological Whole Slide Image (WSI) analysis for cancer classification has been highlighted along with the advancements in microscopic imaging techniques. However, manual examination and diagnosis with WSIs is time-consuming and tiresome. Recently, deep convolutional neural networks have succeeded in histopathological image analysis. In this paper, we propose a novel cancer texture-based deep neural network (CAT-Net) that learns scalable texture features from histopathological WSIs. The innovation of CAT-Net is twofold: (1) capturing invariant spatial patterns by dilated convolutional layers and (2) Reducing model complexity while improving performance. Moreover, CAT-Net can provide discriminative texture patterns formed on cancerous regions of histopathological …


Statistical And Machine Learning Methods Evaluated For Incorporating Soil And Weather Into Corn Nitrogen Recommendations, Curtis J. Ransom, Newell R. Kitchen, James J. Camberato, Paul R. Carter, Richard B. Ferguson, Fabián G. Fernández, David W. Franzen, Carrie A. M. Laboski, D. Brenton Myers, Emerson D. Nafziger, John E. Sawyer, John F. Shanahan Aug 2019

Statistical And Machine Learning Methods Evaluated For Incorporating Soil And Weather Into Corn Nitrogen Recommendations, Curtis J. Ransom, Newell R. Kitchen, James J. Camberato, Paul R. Carter, Richard B. Ferguson, Fabián G. Fernández, David W. Franzen, Carrie A. M. Laboski, D. Brenton Myers, Emerson D. Nafziger, John E. Sawyer, John F. Shanahan

John E. Sawyer

Nitrogen (N) fertilizer recommendation tools could be improved for estimating corn (Zea mays L.) N needs by incorporating site-specific soil and weather information. However, an evaluation of analytical methods is needed to determine the success of incorporating this information. The objectives of this research were to evaluate statistical and machine learning (ML) algorithms for utilizing soil and weather information for improving corn N recommendation tools. Eight algorithms [stepwise, ridge regression, least absolute shrinkage and selection operator (Lasso), elastic net regression, principal component regression (PCR), partial least squares regression (PLSR), decision tree, and random forest] were evaluated using a dataset …


Computational Analysis Of Large-Scale Trends And Dynamics In Eukaryotic Protein Family Evolution, Joseph Boehm Ahrens Mar 2019

Computational Analysis Of Large-Scale Trends And Dynamics In Eukaryotic Protein Family Evolution, Joseph Boehm Ahrens

FIU Electronic Theses and Dissertations

The myriad protein-coding genes found in present-day eukaryotes arose from a combination of speciation and gene duplication events, spanning more than one billion years of evolution. Notably, as these proteins evolved, the individual residues at each site in their amino acid sequences were replaced at markedly different rates. The relationship between protein structure, protein function, and site-specific rates of amino acid replacement is a topic of ongoing research. Additionally, there is much interest in the different evolutionary constraints imposed on sequences related by speciation (orthologs) versus sequences related by gene duplication (paralogs). A principal aim of this dissertation is to …


Improving Vix Futures Forecasts Using Machine Learning Methods, James Hosker, Slobodan Djurdjevic, Hieu Nguyen, Robert Slater Jan 2019

Improving Vix Futures Forecasts Using Machine Learning Methods, James Hosker, Slobodan Djurdjevic, Hieu Nguyen, Robert Slater

SMU Data Science Review

The problem of forecasting market volatility is a difficult task for most fund managers. Volatility forecasts are used for risk management, alpha (risk) trading, and the reduction of trading friction. Improving the forecasts of future market volatility assists fund managers in adding or reducing risk in their portfolios as well as in increasing hedges to protect their portfolios in anticipation of a market sell-off event. Our analysis compares three existing financial models that forecast future market volatility using the Chicago Board Options Exchange Volatility Index (VIX) to six machine/deep learning supervised regression methods. This analysis determines which models provide best …


Modeling Stochastically Intransitive Relationships In Paired Comparison Data, Ryan Patrick Alexander Mcshane Jan 2019

Modeling Stochastically Intransitive Relationships In Paired Comparison Data, Ryan Patrick Alexander Mcshane

Statistical Science Theses and Dissertations

If the Warriors beat the Rockets and the Rockets beat the Spurs, does that mean that the Warriors are better than the Spurs? Sophisticated fans would argue that the Warriors are better by the transitive property, but could Spurs fans make a legitimate argument that their team is better despite this chain of evidence?

We first explore the nature of intransitive (rock-scissors-paper) relationships with a graph theoretic approach to the method of paired comparisons framework popularized by Kendall and Smith (1940). Then, we focus on the setting where all pairs of items, teams, players, or objects have been compared to …


Quantifying Human Biological Age: A Machine Learning Approach, Syed Ashiqur Rahman Jan 2019

Quantifying Human Biological Age: A Machine Learning Approach, Syed Ashiqur Rahman

Graduate Theses, Dissertations, and Problem Reports

Quantifying human biological age is an important and difficult challenge. Different biomarkers and numerous approaches have been studied for biological age prediction, each with its advantages and limitations. In this work, we first introduce a new anthropometric measure (called Surface-based Body Shape Index, SBSI) that accounts for both body shape and body size, and evaluate its performance as a predictor of all-cause mortality. We analyzed data from the National Health and Human Nutrition Examination Survey (NHANES). Based on the analysis, we introduce a new body shape index constructed from four important anthropometric determinants of body shape and body size: body …


Development Of A Slab-Based Monte Carlo Proton Dose Algorithm With A Robust Material-Dependent Nuclear Halo Model, John Wesley Chapman Jr Jun 2018

Development Of A Slab-Based Monte Carlo Proton Dose Algorithm With A Robust Material-Dependent Nuclear Halo Model, John Wesley Chapman Jr

LSU Doctoral Dissertations

Pencil beam algorithms (PBAs) are often utilized for dose calculation in proton therapy treatment planning because they are fast and accurate under most conditions. However, as discussed in Chapman et al (2017), the accuracy of a PBA can be limited under certain conditions because of two major assumptions: (1) the central-axis semi-infinite slab approximation; and, (2) the lack of material dependence in the nuclear halo model. To address these limitations, we transported individual protons using a class II condensed history Monte Carlo and added a novel energy loss method that scaled the nuclear halo equation in water to arbitrary geometry. …


On The Performance Of Some Poisson Ridge Regression Estimators, Cynthia Zaldivar Mar 2018

On The Performance Of Some Poisson Ridge Regression Estimators, Cynthia Zaldivar

FIU Electronic Theses and Dissertations

Multiple regression models play an important role in analyzing and making predictions about data. Prediction accuracy becomes lower when two or more explanatory variables in the model are highly correlated. One solution is to use ridge regression. The purpose of this thesis is to study the performance of available ridge regression estimators for Poisson regression models in the presence of moderately to highly correlated variables. As performance criteria, we use mean square error (MSE), mean absolute percentage error (MAPE), and percentage of times the maximum likelihood (ML) estimator produces a higher MSE than the ridge regression estimator. A Monte Carlo …


Models As Weapons: Review Of Weapons Of Math Destruction: How Big Data Increases Inequality And Threatens Democracy By Cathy O’Neil (2016), Samuel L. Tunstall Jan 2018

Models As Weapons: Review Of Weapons Of Math Destruction: How Big Data Increases Inequality And Threatens Democracy By Cathy O’Neil (2016), Samuel L. Tunstall

Numeracy

Cathy O’Neil. 2016. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy (New York, NY: Crown) 272 pp. ISBN 978-0553418811.

Accessible to a wide readership, Cathy O’Neil’s Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy provides a lucid yet alarming account of the extensive reach of mathematical models in influencing all of our lives. With a particular eye towards social justice, O’Neil not only warns modelers to be cognizant of the effects of their work on real people—especially vulnerable groups who have less power to fight back—but also encourages laypersons to take initiative …


Estimating The Respiratory Lung Motion Model Using Tensor Decomposition On Displacement Vector Field, Kingston Kang Jan 2018

Estimating The Respiratory Lung Motion Model Using Tensor Decomposition On Displacement Vector Field, Kingston Kang

Theses and Dissertations

Modern big data often emerge as tensors. Standard statistical methods are inadequate to deal with datasets of large volume, high dimensionality, and complex structure. Therefore, it is important to develop algorithms such as low-rank tensor decomposition for data compression, dimensionality reduction, and approximation.

With the advancement in technology, high-dimensional images are becoming ubiquitous in the medical field. In lung radiation therapy, the respiratory motion of the lung introduces variabilities during treatment as the tumor inside the lung is moving, which brings challenges to the precise delivery of radiation to the tumor. Several approaches to quantifying this uncertainty propose using a …


Classification With Large Sparse Datasets: Convergence Analysis And Scalable Algorithms, Xiang Li Jul 2017

Classification With Large Sparse Datasets: Convergence Analysis And Scalable Algorithms, Xiang Li

Electronic Thesis and Dissertation Repository

Large and sparse datasets, such as user ratings over a large collection of items, are common in the big data era. Many applications need to classify the users or items based on the high-dimensional and sparse data vectors, e.g., to predict the profitability of a product or the age group of a user, etc. Linear classifiers are popular choices for classifying such datasets because of their efficiency. In order to classify the large sparse data more effectively, the following important questions need to be answered.

1. Sparse data and convergence behavior. How different properties of a dataset, such as …


Hpcnmf: A High-Performance Toolbox For Non-Negative Matrix Factorization, Karthik Devarajan, Guoli Wang Feb 2016

Hpcnmf: A High-Performance Toolbox For Non-Negative Matrix Factorization, Karthik Devarajan, Guoli Wang

COBRA Preprint Series

Non-negative matrix factorization (NMF) is a widely used machine learning algorithm for dimension reduction of large-scale data. It has found successful applications in a variety of fields such as computational biology, neuroscience, natural language processing, information retrieval, image processing and speech recognition. In bioinformatics, for example, it has been used to extract patterns and profiles from genomic and text-mining data as well as in protein sequence and structure analysis. While the scientific performance of NMF is very promising in dealing with high dimensional data sets and complex data structures, its computational cost is high and sometimes could be critical for …


Prediction: The Quintessential Model Validation Test, Wayne Wakeland Oct 2015

Prediction: The Quintessential Model Validation Test, Wayne Wakeland

Systems Science Friday Noon Seminar Series

It is essential to objectively test how well policy models predict real world behavior. The method used to support this assertion involves the review of three SD policy models emphasizing the degree to which the model was able to fit the historical outcome data and how well model-predicted outcomes matched real world outcomes as they unfolded. Findings indicate that while historical model agreement is a favorable indication of model validity, the act of making predictions without knowing the actual data, and comparing these predictions to actual data, can reveal model weaknesses that might be overlooked when all of the available …


Modeling Traffic At An Intersection, Kaleigh L. Mulkey, Saniita K. Fasenntao Apr 2015

Modeling Traffic At An Intersection, Kaleigh L. Mulkey, Saniita K. Fasenntao

Symposium of Student Scholars

The main purpose of this project is to build a mathematical model for traffic at a busy intersection. We use elements of Queueing Theory to build our model: the vehicles driving into the intersection are the “arrival process” and the stop light in the intersection is the “server.”

We collected traffic data on the number of vehicles arriving to the intersection, the duration of green and red lights, and the number of vehicles going through the intersection during a green light. We built a SAS macro code to simulate traffic based on parameters derived from the data.

In our program …


Identifying Key Variables And Interactions In Statistical Models Of Building Energy Consumption Using Regularization, David Hsu Mar 2015

Identifying Key Variables And Interactions In Statistical Models Of Building Energy Consumption Using Regularization, David Hsu

David Hsu

Statistical models can only be as good as the data put into them. Data about energy consumption continues to grow, particularly its non-technical aspects, but these variables are often interpreted differently among disciplines, datasets, and contexts. Selecting key variables and interactions is therefore an important step in achieving more accurate predictions, better interpretation, and identification of key subgroups for further analysis.

This paper therefore makes two main contributions to the modeling and analysis of energy consumption of buildings. First, it introduces regularization, also known as penalized regression, for principled selection of variables and interactions. Second, this approach is demonstrated by …


Caimans - Semantic Platform For Advance Content Mining (Sketch Wp), Salvo Reina Jul 2013

Caimans - Semantic Platform For Advance Content Mining (Sketch Wp), Salvo Reina

Salvo Reina

A middleware SW platform was created for automatic classification of textual contents. The worksheet of requirements and the original flow-sketchs are published.


Iterative Statistical Verification Of Probabilistic Plans, Colin M. Potts May 2013

Iterative Statistical Verification Of Probabilistic Plans, Colin M. Potts

Lawrence University Honors Projects

Artificial intelligence seeks to create intelligent agents. An agent can be anything: an autopilot, a self-driving car, a robot, a person, or even an anti-virus system. While the current state-of-the-art may not achieve intelligence (a rather dubious thing to quantify) it certainly achieves a sense of autonomy. A key aspect of an autonomous system is its ability to maintain and guarantee safety—defined as avoiding some set of undesired outcomes. The piece of software responsible for this is called a planner, which is essentially an automated problem solver. An advantage computer planners have over humans is their ability to consider and …


Retrieval Of Sub-Pixel-Based Fire Intensity And Its Application For Characterizing Smoke Injection Heights And Fire Weather In North America, David Peterson Sep 2012

Retrieval Of Sub-Pixel-Based Fire Intensity And Its Application For Characterizing Smoke Injection Heights And Fire Weather In North America, David Peterson

Department of Earth and Atmospheric Sciences: Dissertations, Theses, and Student Research

For over two decades, satellite sensors have provided the locations of global fire activity with ever-increasing accuracy. However, the ability to measure fire intensity, know as fire radiative power (FRP), and its potential relationships to meteorology and smoke plume injection heights, are currently limited by the pixel resolution. This dissertation describes the development of a new, sub-pixel-based FRP calculation (FRPf) for fire pixels detected by the MODerate Resolution Imaging Spectroradiometer (MODIS) fire detection algorithm (Collection 5), which is subsequently applied to several large wildfire events in North America. The methodology inherits an earlier bi-spectral algorithm for retrieving sub-pixel …