Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Data Science

South Dakota State University

Conference

Publication Year
File Type

Articles 1 - 8 of 8

Full-Text Articles in Physical Sciences and Mathematics

Session 8: Machine Learning Based Behavior Of Non-Opec Global Supply In Crude Oil Price Determinism, Mofe Jeje Feb 2024

Session 8: Machine Learning Based Behavior Of Non-Opec Global Supply In Crude Oil Price Determinism, Mofe Jeje

SDSU Data Science Symposium

Abstract

While studies on global oil price variability, occasioned by OPEC crude oil supply, is well documented in energy literature; the impact assessment of non-OPEC global oil supply on price variability, on the other hand, has not received commensurate attention. Given this gap, the primary objective of this study, therefore, is to estimate the magnitude of oil price determinism that is explained by the share of non-OPEC’s global crude oil supply. Using secondary sources of data collection method, data for target variable will be collected from the US Federal Reserve, as it relates to annual crude oil price variability, while …


Predicting Crop Yield Using Remote Sensing Data, Mary Row, Jung-Han Kimn, Hossein Moradi Feb 2024

Predicting Crop Yield Using Remote Sensing Data, Mary Row, Jung-Han Kimn, Hossein Moradi

SDSU Data Science Symposium

Accurate crop yield predictions can help farmers make adjustments or changes in their farming practices to optimize their harvest. Remote sensing data is an inexpensive approach to collecting massive amounts of data that could be utilized for predicting crop yield. This study employed linear regression and spatial linear models were used to predict soybean yield with data from Landsat 8 OLI. Each model was built using only spectral bands of the satellite, only vegetation indices, and both spectral bands and vegetation indices. All analysis was based on data collected from two fields in South Dakota from the 2019 and 2021 …


Principal Component Analysis With Application To Credit Card Data, Eleanor Cain, Semhar Michael, Gary Hatfield Feb 2024

Principal Component Analysis With Application To Credit Card Data, Eleanor Cain, Semhar Michael, Gary Hatfield

SDSU Data Science Symposium

Principal Component Analysis (PCA) is a type of dimension reduction technique used in data analysis to process the data before making a model. In general, dimension reduction allows analysts to make conclusions about large data sets by reducing the number of variables while retaining as much information as possible. Using the numerical variables from a data set, PCA aims to compute a smaller set of uncorrelated variables, called principal components, that account for a majority of the variability from the data. The purpose of this poster is to understand PCA as well as perform PCA on a large sample credit …


Session 6: Model-Based Clustering Analysis On The Spatial-Temporal And Intensity Patterns Of Tornadoes, Yana Melnykov, Yingying Zhang, Rong Zheng Feb 2024

Session 6: Model-Based Clustering Analysis On The Spatial-Temporal And Intensity Patterns Of Tornadoes, Yana Melnykov, Yingying Zhang, Rong Zheng

SDSU Data Science Symposium

Tornadoes are one of the nature’s most violent windstorms that can occur all over the world except Antarctica. Previous scientific efforts were spent on studying this nature hazard from facets such as: genesis, dynamics, detection, forecasting, warning, measuring, and assessing. While we want to model the tornado datasets by using modern sophisticated statistical and computational techniques. The goal of the paper is developing novel finite mixture models and performing clustering analysis on the spatial-temporal and intensity patterns of the tornadoes. To analyze the tornado dataset, we firstly try a Gaussian distribution with the mean vector and variance-covariance matrix represented as …


Session11: Skip-Gcn : A Framework For Hierarchical Graph Representation Learning, Jackson Cates, Justin Lewis, Randy Hoover, Kyle Caudle Feb 2023

Session11: Skip-Gcn : A Framework For Hierarchical Graph Representation Learning, Jackson Cates, Justin Lewis, Randy Hoover, Kyle Caudle

SDSU Data Science Symposium

Recently there has been high demand for the representation learning of graphs. Graphs are a complex data structure that contains both topology and features. There are first several domains for graphs, such as infectious disease contact tracing and social media network communications interactions. The literature describes several methods developed that work to represent nodes in an embedding space, allowing for classical techniques to perform node classification and prediction. One such method is the graph convolutional neural network that aggregates the node neighbor’s features to create the embedding. Another method, Walklets, takes advantage of the topological information stored in a graph …


2d Respiratory Sound Analysis To Detect Lung Abnormalities, Rafia Sharmin Alice, Kc Santosh Feb 2023

2d Respiratory Sound Analysis To Detect Lung Abnormalities, Rafia Sharmin Alice, Kc Santosh

SDSU Data Science Symposium

Abstract. In this paper, we analyze deep visual features from 2D data representation(s) of the respiratory sound to detect evidence of lung abnormalities. The primary motivation behind this is that visual cues are more important in decision-making than raw data (lung sound). Early detection and prompt treatments are essential for any future possible respiratory disorders, and respiratory sound is proven to be one of the biomarkers. In contrast to state-of-the-art approaches, we aim at understanding/analyzing visual features using our Convolutional Neural Networks (CNN) tailored Deep Learning Models, where we consider all possible 2D data such as Spectrogram, Mel-frequency Cepstral Coefficients …


Temporal Tensor Factorization For Multidimensional Forecasting, Jackson Cates, Karissa Scipke, Randy Hoover, Kyle Caudle Feb 2023

Temporal Tensor Factorization For Multidimensional Forecasting, Jackson Cates, Karissa Scipke, Randy Hoover, Kyle Caudle

SDSU Data Science Symposium

In the era of big data, there is a need for forecasting high-dimensional time series that might be incomplete, sparse, and/or nonstationary. The current research aims to solve this problem for two-dimensional data through a combination of temporal matrix factorization (TMF) and low-rank tensor factorization. From this method, we propose an expansion of TMF to two-dimensional data: temporal tensor factorization (TTF). The current research aims to interpolate missing values via low-rank tensor factorization, which produces a latent space of the original multilinear time series. We then can perform forecasting in the latent space. We present experimental results of the proposed …


Session 5: Equipment Finance Credit Risk Modeling - A Case Study In Creative Model Development & Nimble Data Engineering, Edward Krueger, Landon Thompson, Josh Moore Feb 2022

Session 5: Equipment Finance Credit Risk Modeling - A Case Study In Creative Model Development & Nimble Data Engineering, Edward Krueger, Landon Thompson, Josh Moore

SDSU Data Science Symposium

This presentation will focus first on providing an overview of Channel and the Risk Analytics team that performed this case study. Given that context, we’ll then dive into our approach for building the modeling development data set, techniques and tools used to develop and implement the model into a production environment, and some of the challenges faced upon launch. Then, the presentation will pivot to the data engineering pipeline. During this portion, we will explore the application process and what happens to the data we collect. This will include how we extract & store the data along with how it …