Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 27 of 27

Full-Text Articles in Physical Sciences and Mathematics

Session 8: Machine Learning Based Behavior Of Non-Opec Global Supply In Crude Oil Price Determinism, Mofe Jeje Feb 2024

Session 8: Machine Learning Based Behavior Of Non-Opec Global Supply In Crude Oil Price Determinism, Mofe Jeje

SDSU Data Science Symposium

Abstract

While studies on global oil price variability, occasioned by OPEC crude oil supply, is well documented in energy literature; the impact assessment of non-OPEC global oil supply on price variability, on the other hand, has not received commensurate attention. Given this gap, the primary objective of this study, therefore, is to estimate the magnitude of oil price determinism that is explained by the share of non-OPEC’s global crude oil supply. Using secondary sources of data collection method, data for target variable will be collected from the US Federal Reserve, as it relates to annual crude oil price variability, while …


Predicting Crop Yield Using Remote Sensing Data, Mary Row, Jung-Han Kimn, Hossein Moradi Feb 2024

Predicting Crop Yield Using Remote Sensing Data, Mary Row, Jung-Han Kimn, Hossein Moradi

SDSU Data Science Symposium

Accurate crop yield predictions can help farmers make adjustments or changes in their farming practices to optimize their harvest. Remote sensing data is an inexpensive approach to collecting massive amounts of data that could be utilized for predicting crop yield. This study employed linear regression and spatial linear models were used to predict soybean yield with data from Landsat 8 OLI. Each model was built using only spectral bands of the satellite, only vegetation indices, and both spectral bands and vegetation indices. All analysis was based on data collected from two fields in South Dakota from the 2019 and 2021 …


Principal Component Analysis With Application To Credit Card Data, Eleanor Cain, Semhar Michael, Gary Hatfield Feb 2024

Principal Component Analysis With Application To Credit Card Data, Eleanor Cain, Semhar Michael, Gary Hatfield

SDSU Data Science Symposium

Principal Component Analysis (PCA) is a type of dimension reduction technique used in data analysis to process the data before making a model. In general, dimension reduction allows analysts to make conclusions about large data sets by reducing the number of variables while retaining as much information as possible. Using the numerical variables from a data set, PCA aims to compute a smaller set of uncorrelated variables, called principal components, that account for a majority of the variability from the data. The purpose of this poster is to understand PCA as well as perform PCA on a large sample credit …


Session 6: Model-Based Clustering Analysis On The Spatial-Temporal And Intensity Patterns Of Tornadoes, Yana Melnykov, Yingying Zhang, Rong Zheng Feb 2024

Session 6: Model-Based Clustering Analysis On The Spatial-Temporal And Intensity Patterns Of Tornadoes, Yana Melnykov, Yingying Zhang, Rong Zheng

SDSU Data Science Symposium

Tornadoes are one of the nature’s most violent windstorms that can occur all over the world except Antarctica. Previous scientific efforts were spent on studying this nature hazard from facets such as: genesis, dynamics, detection, forecasting, warning, measuring, and assessing. While we want to model the tornado datasets by using modern sophisticated statistical and computational techniques. The goal of the paper is developing novel finite mixture models and performing clustering analysis on the spatial-temporal and intensity patterns of the tornadoes. To analyze the tornado dataset, we firstly try a Gaussian distribution with the mean vector and variance-covariance matrix represented as …


Session 6: The Size-Biased Lognormal Mixture With The Entropy Regularized Algorithm, Tatjana Miljkovic, Taehan Bae Feb 2024

Session 6: The Size-Biased Lognormal Mixture With The Entropy Regularized Algorithm, Tatjana Miljkovic, Taehan Bae

SDSU Data Science Symposium

A size-biased left-truncated Lognormal (SB-ltLN) mixture is proposed as a robust alternative to the Erlang mixture for modeling left-truncated insurance losses with a heavy tail. The weak denseness property of the weighted Lognormal mixture is studied along with the tail behavior. Explicit analytical solutions are derived for moments and Tail Value at Risk based on the proposed model. An extension of the regularized expectation–maximization (REM) algorithm with Shannon's entropy weights (ewREM) is introduced for parameter estimation and variability assessment. The left-truncated internal fraud data set from the Operational Riskdata eXchange is used to illustrate applications of the proposed model. Finally, …


Session 12: Analysis Of State And Parameter Estimation Techniques Using Dynamic Perturbation Signals, Timothy M. Hansen Feb 2023

Session 12: Analysis Of State And Parameter Estimation Techniques Using Dynamic Perturbation Signals, Timothy M. Hansen

SDSU Data Science Symposium

The trend in electric power systems is the displacement of traditional synchronous generation (e.g., coal, natural gas) with renewable energy resources (e.g., wind, solar photovoltaic) and battery energy storage. These energy resources require power electronic converters (PECs) to interconnect to the grid and have different response characteristics and dynamic stability issues compared to conventional synchronous generators. As a result, there is a need for validated models to study and mitigate PEC-based stability issues, especially for converter dominated power systems (e.g., island power systems, remote microgrids).

This presentation will introduce methods related to dynamic state and parameter estimation via the design …


Session11: Skip-Gcn : A Framework For Hierarchical Graph Representation Learning, Jackson Cates, Justin Lewis, Randy Hoover, Kyle Caudle Feb 2023

Session11: Skip-Gcn : A Framework For Hierarchical Graph Representation Learning, Jackson Cates, Justin Lewis, Randy Hoover, Kyle Caudle

SDSU Data Science Symposium

Recently there has been high demand for the representation learning of graphs. Graphs are a complex data structure that contains both topology and features. There are first several domains for graphs, such as infectious disease contact tracing and social media network communications interactions. The literature describes several methods developed that work to represent nodes in an embedding space, allowing for classical techniques to perform node classification and prediction. One such method is the graph convolutional neural network that aggregates the node neighbor’s features to create the embedding. Another method, Walklets, takes advantage of the topological information stored in a graph …


Two-Stage Approach For Forensic Handwriting Analysis, Ashlan J. Simpson, Danica M. Ommen Feb 2023

Two-Stage Approach For Forensic Handwriting Analysis, Ashlan J. Simpson, Danica M. Ommen

SDSU Data Science Symposium

Trained experts currently perform the handwriting analysis required in the criminal justice field, but this can create biases, delays, and expenses, leaving room for improvement. Prior research has sought to address this by analyzing handwriting through feature-based and score-based likelihood ratios for assessing evidence within a probabilistic framework. However, error rates are not well defined within this framework, making it difficult to evaluate the method and can lead to making a greater-than-expected number of errors when applying the approach. This research explores a method for assessing handwriting within the Two-Stage framework, which allows for quantifying error rates as recommended by …


2d Respiratory Sound Analysis To Detect Lung Abnormalities, Rafia Sharmin Alice, Kc Santosh Feb 2023

2d Respiratory Sound Analysis To Detect Lung Abnormalities, Rafia Sharmin Alice, Kc Santosh

SDSU Data Science Symposium

Abstract. In this paper, we analyze deep visual features from 2D data representation(s) of the respiratory sound to detect evidence of lung abnormalities. The primary motivation behind this is that visual cues are more important in decision-making than raw data (lung sound). Early detection and prompt treatments are essential for any future possible respiratory disorders, and respiratory sound is proven to be one of the biomarkers. In contrast to state-of-the-art approaches, we aim at understanding/analyzing visual features using our Convolutional Neural Networks (CNN) tailored Deep Learning Models, where we consider all possible 2D data such as Spectrogram, Mel-frequency Cepstral Coefficients …


A Characterization Of Bias Introduced Into Forensic Source Identification When There Is A Subpopulation Structure In The Relevant Source Population., Dylan Borchert, Semhar Michael, Christopher Saunders Feb 2023

A Characterization Of Bias Introduced Into Forensic Source Identification When There Is A Subpopulation Structure In The Relevant Source Population., Dylan Borchert, Semhar Michael, Christopher Saunders

SDSU Data Science Symposium

In forensic source identification the forensic expert is responsible for providing a summary of the evidence that allows for a decision maker to make a logical and coherent decision concerning the source of some trace evidence of interest. The academic consensus is usually that this summary should take the form of a likelihood ratio (LR) that summarizes the likelihood of the trace evidence arising under two competing propositions. These competing propositions are usually referred to as the prosecution’s proposition, that the specified source is the actual source of the trace evidence, and the defense’s proposition, that another source in a …


Application Of Gaussian Mixture Models To Simulated Additive Manufacturing, Jason Hasse, Semhar Michael, Anamika Prasad Feb 2023

Application Of Gaussian Mixture Models To Simulated Additive Manufacturing, Jason Hasse, Semhar Michael, Anamika Prasad

SDSU Data Science Symposium

Additive manufacturing (AM) is the process of building components through an iterative process of adding material in specific designs. AM has a wide range of process parameters that influence the quality of the component. This work applies Gaussian mixture models to detect clusters of similar stress values within and across components manufactured with varying process parameters. Further, a mixture of regression models is considered to simultaneously find groups and also fit regression within each group. The results are compared with a previous naive approach.


Finite Mixture Modeling For Hierarchically Structured Data With Application To Keystroke Dynamics, Andrew Simpson, Semhar Michael Feb 2023

Finite Mixture Modeling For Hierarchically Structured Data With Application To Keystroke Dynamics, Andrew Simpson, Semhar Michael

SDSU Data Science Symposium

Keystroke dynamics has been used to both authenticate users of computer systems and detect unauthorized users who attempt to access the system. Monitoring keystroke dynamics adds another level to computer security as passwords are often compromised. Keystrokes can also be continuously monitored long after a password has been entered and the user is accessing the system for added security. Many of the current methods that have been proposed are supervised methods in that they assume that the true user of each keystroke is known apriori. This is not always true for example with businesses and government agencies which have internal …


Models For Predicting Maximum Potential Intensity Of Tropical Cyclones, Iftekhar Chowdhury, Gemechis Djira Feb 2023

Models For Predicting Maximum Potential Intensity Of Tropical Cyclones, Iftekhar Chowdhury, Gemechis Djira

SDSU Data Science Symposium

Tropical cyclones (TCs) are considered as extreme weather events, which has a low-pressure center, namely an eye, strong winds, and a spiral arrangement of thunderstorms that produces heavy rain, storm surges, and can cause severe destruction in coastal areas worldwide. Therefore, reliable forecasts of the maximum potential intensity (MPI) of TCs are critical to estimate the damages to properties, lives, and risk assessment. In this study, we explore and propose various regression models, to predict the potential intensity of TCs in the North Atlantic at 12, 24, 36, 48, 60, and 72- hour forecasting lead time. In addition, a popular …


Temporal Tensor Factorization For Multidimensional Forecasting, Jackson Cates, Karissa Scipke, Randy Hoover, Kyle Caudle Feb 2023

Temporal Tensor Factorization For Multidimensional Forecasting, Jackson Cates, Karissa Scipke, Randy Hoover, Kyle Caudle

SDSU Data Science Symposium

In the era of big data, there is a need for forecasting high-dimensional time series that might be incomplete, sparse, and/or nonstationary. The current research aims to solve this problem for two-dimensional data through a combination of temporal matrix factorization (TMF) and low-rank tensor factorization. From this method, we propose an expansion of TMF to two-dimensional data: temporal tensor factorization (TTF). The current research aims to interpolate missing values via low-rank tensor factorization, which produces a latent space of the original multilinear time series. We then can perform forecasting in the latent space. We present experimental results of the proposed …


Session 8: Ensemble Of Score Likelihood Ratios For The Common Source Problem, Federico Veneri, Danica M. Ommen Feb 2023

Session 8: Ensemble Of Score Likelihood Ratios For The Common Source Problem, Federico Veneri, Danica M. Ommen

SDSU Data Science Symposium

Machine learning-based Score Likelihood Ratios have been proposed as an alternative to traditional Likelihood Ratios and Bayes Factor to quantify the value of evidence when contrasting two opposing propositions.

Under the common source problem, the opposing proposition relates to the inferential problem of assessing whether two items come from the same source. Machine learning techniques can be used to construct a (dis)similarity score for complex data when developing a traditional model is infeasible, and density estimation is used to estimate the likelihood of the scores under both propositions.

In practice, the metric and its distribution are developed using pairwise comparisons …


Session 2: The Effect Of Boom Leveling On Spray Dispersion, Travis A. Burgers, Miguel Bustamante, Juan F. Vivanco Feb 2023

Session 2: The Effect Of Boom Leveling On Spray Dispersion, Travis A. Burgers, Miguel Bustamante, Juan F. Vivanco

SDSU Data Science Symposium

Self-propelled sprayers are commonly used in agriculture to disperse agrichemicals. These sprayers commonly have two boom wings with dozens of nozzles that disperse the chemicals. Automatic boom height systems reduce the variability of agricultural sprayer boom height, which is important to reduce uneven spray dispersion if the boom is not at the target height.

A computational model was created to simulate the spray dispersion under the following conditions: a) one stationary nozzle based on the measured spray pattern from one nozzle, b) one stationary model due to an angled boom, c) superposition of multiple stationary nozzles due an angled boom, …


Session 13: On Statistical Estimates Of The Inverted Kumaraswamy Distribution Under Adaptive Type-I Progressive Hybrid Censoring, Qingqing Li, Yuhlong Lio Feb 2022

Session 13: On Statistical Estimates Of The Inverted Kumaraswamy Distribution Under Adaptive Type-I Progressive Hybrid Censoring, Qingqing Li, Yuhlong Lio

SDSU Data Science Symposium

The probability distribution modeling is investigated via maximum likelihood estimation method based on adaptive type-I progressively hybrid censored samples from the inverted Kumaraswamy distribution. The point estimates of model parameters, reliability, hazard rate and quantile are obtained and confidence intervals are also developed by using asymptotic distribution as well as bootstrap method. Monte Carlo simulation has been performed to evaluate the accuracy of estimations. Finally, a real data set is given for the application illustration.


An Alpha-Based Prescreening Methodology For A Common But Unknown Source Likelihood Ratio With Different Subpopulation Structures, Dylan Borchert, Semhar Michael, Christopher Saunders, Andrew Simpson Feb 2022

An Alpha-Based Prescreening Methodology For A Common But Unknown Source Likelihood Ratio With Different Subpopulation Structures, Dylan Borchert, Semhar Michael, Christopher Saunders, Andrew Simpson

SDSU Data Science Symposium

Prescreening is a commonly used methodology in which the forensic examiner includes sources from the background population that meet a certain degree of similarity to the given piece of evidence. The goal of prescreening is to find the sources closest to the given piece of evidence in an alternative source population for further analysis. This paper discusses the behavior of an $\alpha-$based prescreening methodology in the form of a Hotelling $T^2$ test on the background population for a common but unknown source likelihood ratio. An extensive simulation study with synthetic and real data were conducted. We find that prescreening helps …


Identifying Subpopulations Of A Hierarchical Structured Data Using A Semi-Supervised Mixture Modeling Approach, Andrew Simpson, Semhar Michael, Christopher Saunders, Dylan Borchert Feb 2022

Identifying Subpopulations Of A Hierarchical Structured Data Using A Semi-Supervised Mixture Modeling Approach, Andrew Simpson, Semhar Michael, Christopher Saunders, Dylan Borchert

SDSU Data Science Symposium

The field of forensic statistics offers a unique hierarchical data structure in which a population is composed of several subpopulations of sources and a sample is collected from each source. This subpopulation structure creates a hierarchical layer. We propose using a semi-supervised mixture modeling approach to model the subpopulation structure which leverages the fact that we know the collection of samples came from the same, yet unknown, source. A simulation study based on a famous glass data was conducted and shows this method performs better than other unsupervised approaches which have been previously used in practice.


Session 5: Equipment Finance Credit Risk Modeling - A Case Study In Creative Model Development & Nimble Data Engineering, Edward Krueger, Landon Thompson, Josh Moore Feb 2022

Session 5: Equipment Finance Credit Risk Modeling - A Case Study In Creative Model Development & Nimble Data Engineering, Edward Krueger, Landon Thompson, Josh Moore

SDSU Data Science Symposium

This presentation will focus first on providing an overview of Channel and the Risk Analytics team that performed this case study. Given that context, we’ll then dive into our approach for building the modeling development data set, techniques and tools used to develop and implement the model into a production environment, and some of the challenges faced upon launch. Then, the presentation will pivot to the data engineering pipeline. During this portion, we will explore the application process and what happens to the data we collect. This will include how we extract & store the data along with how it …


Session 11 - Methods: Bootstrap Control Chart For Pareto Percentiles, Ruth Burkhalter Feb 2020

Session 11 - Methods: Bootstrap Control Chart For Pareto Percentiles, Ruth Burkhalter

SDSU Data Science Symposium

Lifetime percentile is an important indicator of product reliability. However, the sampling distribution of a percentile estimator for any lifetime distribution is not a bell shaped one. As a result, the well-known Shewhart-type control chart cannot be applied to monitor the product lifetime percentiles. In this presentation, Bootstrap control charts based on maximum likelihood estimator (MLE) are proposed for monitoring Pareto percentiles. An intensive simulation study is conducted to compare the performance among the proposed MLE Bootstrap control chart and Shewhart-type control chart.


An Alternative To The One-Size-Fits-All Approach To Isa Training: A Design Science Approach To Isa Regarding The Adaption To Student Vulnerability Based On Knowledge And Behavior, Thomas Jernejcic Feb 2020

An Alternative To The One-Size-Fits-All Approach To Isa Training: A Design Science Approach To Isa Regarding The Adaption To Student Vulnerability Based On Knowledge And Behavior, Thomas Jernejcic

SDSU Data Science Symposium

Any connection to the university’s network is a conduit that has the potential of being exploited by an attacker, resulting in the possibility of substantial harm to the infrastructure, to the university, and to the student body of whom the university serves. While organizations rightfully “baton down the hatches” by building firewalls, creating proxies, and applying important updates, the most significant vulnerability, that of the student, continues to be an issue due to lack of knowledge, insufficient motivation, and inadequate or misguided training. Utilizing the Design Science Research (DSR) methodology, this research effort seeks to address the latter concern of …


Evaluation Of Text Mining Techniques Using Twitter Data For Hurricane Disaster Resilience, Joshua Eason, Sathish Kumar Feb 2020

Evaluation Of Text Mining Techniques Using Twitter Data For Hurricane Disaster Resilience, Joshua Eason, Sathish Kumar

SDSU Data Science Symposium

Data obtained from social media microblogging websites such as Twitter provide the unique ability to collect and analyze conversations of the public in order to gain perspective on the thoughts and feelings of the general public. Sentiment and volume analysis techniques were applied to the dataset in order to gain an understanding of the amount and level of sentiment associated with certain disaster-related tweets, including a topical analysis of specific terms. This study showed that disaster-type events such as a hurricane can cause some strong negative sentiment in the period of time directly preceding the event, but ultimately returns quickly …


Asymptotic Simultaneous Estimations For Contrasts Of Quantiles, Lawrence Sethor Segbehoe, Frank Schaarschmidt, Gemechis Dilba Djira Feb 2020

Asymptotic Simultaneous Estimations For Contrasts Of Quantiles, Lawrence Sethor Segbehoe, Frank Schaarschmidt, Gemechis Dilba Djira

SDSU Data Science Symposium

Although the expected value is popular, many researches in the health and social sciences involve skewed distributions and inferences concerning quantiles. Most standard multiple comparison procedures require the normality assumption. For example, few methods exist for comparing the medians of independent samples or quantiles of several distributions in general. To our knowledge, there is no general-purpose method for constructing simultaneous confidence intervals for multiple contrasts of quantiles. In this paper, we develop an asymptotic method for constructing such intervals and extend the idea to that of time-to-event data in survival analysis. Small-sample performance of the proposed method is assessed in …


Neural Shrubs: Using Neural Networks To Improve Decision Trees, Kyle Caudle, Randy Hoover, Aaron Alphonsus Feb 2019

Neural Shrubs: Using Neural Networks To Improve Decision Trees, Kyle Caudle, Randy Hoover, Aaron Alphonsus

SDSU Data Science Symposium

Decision trees are a method commonly used in machine learning to either predict a categorical response or a continuous response variable. Once the tree partitions the space, the response is either determined by the majority vote – classification trees, or by averaging the response values – regression trees. This research builds a standard regression tree and then instead of averaging the responses, we train a neural network to determine the response value. We have found that our approach typically increases the predicative capability of the decision tree. We have 2 demonstrations of this approach that we wish to present as …


Session: 4 Multilinear Subspace Learning And Its Applications To Machine Learning, Randy Hoover, Kyle Caudle Dr., Karen Braman Dr. Feb 2019

Session: 4 Multilinear Subspace Learning And Its Applications To Machine Learning, Randy Hoover, Kyle Caudle Dr., Karen Braman Dr.

SDSU Data Science Symposium

Multi-dimensional data analysis has seen increased interest in recent years. With more and more data arriving as 2-dimensional arrays (images) as opposed to 1-dimensioanl arrays (signals), new methods for dimensionality reduction, data analysis, and machine learning have been pursued. Most notably have been the Canonical Decompositions/Parallel Factors (commonly referred to as CP) and Tucker decompositions (commonly regarded as a high order SVD: HOSVD). In the current research we present an alternate method for computing singular value and eigenvalue decompositions on multi-way data through an algebra of circulants and illustrate their application to two well-known machine learning methods: Multi-Linear Principal Component …


Predicting Unplanned Medical Visits Among Patients With Diabetes Using Machine Learning, Arielle Selya, Eric L. Johnson Feb 2019

Predicting Unplanned Medical Visits Among Patients With Diabetes Using Machine Learning, Arielle Selya, Eric L. Johnson

SDSU Data Science Symposium

Diabetes poses a variety of medical complications to patients, resulting in a high rate of unplanned medical visits, which are costly to patients and healthcare providers alike. However, unplanned medical visits by their nature are very difficult to predict. The current project draws upon electronic health records (EMR’s) of adult patients with diabetes who received care at Sanford Health between 2014 and 2017. Various machine learning methods were used to predict which patients have had an unplanned medical visit based on a variety of EMR variables (age, BMI, blood pressure, # of prescriptions, # of diagnoses on problem list, A1C, …