Open Access. Powered by Scholars. Published by Universities.®

Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Data Science

2022

Institution
Keyword
Publication
Publication Type

Articles 1 - 18 of 18

Full-Text Articles in Mathematics

Lstm-Sdm: An Integrated Framework Of Lstm Implementation For Sequential Data Modeling[Formula Presented], Hum Nath Bhandari, Binod Rimal, Nawa Raj Pokhrel, Ramchandra Rimal, Keshab R. Dahal Nov 2022

Lstm-Sdm: An Integrated Framework Of Lstm Implementation For Sequential Data Modeling[Formula Presented], Hum Nath Bhandari, Binod Rimal, Nawa Raj Pokhrel, Ramchandra Rimal, Keshab R. Dahal

Arts & Sciences Faculty Publications

LSTM-SDM is a python-based integrated computational framework built on the top of Tensorflow/Keras and written in the Jupyter notebook. It provides several object-oriented functionalities for implementing single layer and multilayer LSTM models for sequential data modeling and time series forecasting. Multiple subroutines are blended to create a conducive user-friendly environment that facilitates data exploration and visualization, normalization and input preparation, hyperparameter tuning, performance evaluations, visualization of results, and statistical analysis. We utilized the LSTM-SDM framework in predicting the stock market index and observed impressive results. The framework can be generalized to solve several other real-world time series problems.


Combinatorial Algorithms For Graph Discovery And Experimental Design, Raghavendra K. Addanki Oct 2022

Combinatorial Algorithms For Graph Discovery And Experimental Design, Raghavendra K. Addanki

Doctoral Dissertations

In this thesis, we study the design and analysis of algorithms for discovering the structure and properties of an unknown graph, with applications in two different domains: causal inference and sublinear graph algorithms. In both these domains, graph discovery is possible using restricted forms of experiments, and our objective is to design low-cost experiments. First, we describe efficient experimental approaches to the causal discovery problem, which in its simplest form, asks us to identify the causal relations (edges of the unknown graph) between variables (vertices of the unknown graph) of a given system. For causal discovery, we study algorithms …


Classification Of Pixel Tracks To Improve Track Reconstruction From Proton-Proton Collisions, Kebur Fantahun, Jobin Joseph, Halle Purdom, Nibhrat Lohia Sep 2022

Classification Of Pixel Tracks To Improve Track Reconstruction From Proton-Proton Collisions, Kebur Fantahun, Jobin Joseph, Halle Purdom, Nibhrat Lohia

SMU Data Science Review

In this paper, machine learning techniques are used to reconstruct particle collision pathways. CERN (Conseil européen pour la recherche nucléaire) uses a massive underground particle collider, called the Large Hadron Collider or LHC, to produce particle collisions at extremely high speeds. There are several layers of detectors in the collider that track the pathways of particles as they collide. The data produced from collisions contains an extraneous amount of background noise, i.e., decays from known particle collisions produce fake signal. Particularly, in the first layer of the detector, the pixel tracker, there is an overwhelming amount of background noise that …


Development Of The Implementation Of Iot Monitoring System Based On Node-Red Technology, Anvar Kabulov, Inomjon Yarashov, Salamat Mirzataev Jun 2022

Development Of The Implementation Of Iot Monitoring System Based On Node-Red Technology, Anvar Kabulov, Inomjon Yarashov, Salamat Mirzataev

Karakalpak Scientific Journal

This article describes how to design and implement a process for storing environmental information in a database using the Internet of Things. The problems that need to be solved with the help of this IoT system are the growing demand for forecasts in the world, the demand of the world market for a new sustainable method of implementing the digitization environment through the Internet of Things. The design was implemented using Arduino, Node-Red and sensors, selected when choosing a component based on the required parameters and sent to the database for monitoring and processing. A study of previous work and …


Applications Of Machine Learning Algorithms In Materials Science And Bioinformatics, Mohammed Quazi Jun 2022

Applications Of Machine Learning Algorithms In Materials Science And Bioinformatics, Mohammed Quazi

Mathematics & Statistics ETDs

The piezoelectric response has been a measure of interest in density functional theory (DFT) for micro-electromechanical systems (MEMS) since the inception of MEMS technology. Piezoelectric-based MEMS devices find wide applications in automobiles, mobile phones, healthcare devices, and silicon chips for computers, to name a few. Piezoelectric properties of doped aluminum nitride (AlN) have been under investigation in materials science for piezoelectric thin films because of its wide range of device applicability. In this research using rigorous DFT calculations, high throughput ab-initio simulations for 23 AlN alloys are generated.

This research is the first to report strong enhancements of piezoelectric properties …


Data Ethics: An Investigation Of Data, Algorithms, And Practice, Gabrialla S. Cockerell May 2022

Data Ethics: An Investigation Of Data, Algorithms, And Practice, Gabrialla S. Cockerell

Honors Projects

This paper encompasses an examination of defective data collection, algorithms, and practices that continue to be cycled through society under the illusion that all information is processed uniformly, and technological innovation consistently parallels societal betterment. However, vulnerable communities, typically the impoverished and racially discriminated, get ensnared in these harmful cycles due to their disadvantages. Their hindrances are reflected in their information due to the interconnectedness of data, such as race being highly correlated to wealth, education, and location. However, their information continues to be analyzed with the same measures as populations who are not significantly affected by racial bias. Not …


Data And Algorithmic Modeling Approaches To Count Data, Andraya Hack May 2022

Data And Algorithmic Modeling Approaches To Count Data, Andraya Hack

Honors College Theses

Various techniques are used to create predictions based on count data. This type of data takes the form of a non-negative integers such as the number of claims an insurance policy holder may make. These predictions can allow people to prepare for likely outcomes. Thus, it is important to know how accurate the predictions are. Traditional statistical approaches for predicting count data include Poisson regression as well as negative binomial regression. Both methods also have a zero-inflated version that can be used when the data has an overabundance of zeros. Another procedure is to use computer algorithms, also known as …


Intra-Hour Solar Forecasting Using Cloud Dynamics Features Extracted From Ground-Based Infrared Sky Images, Guillermo Terrén-Serrano Apr 2022

Intra-Hour Solar Forecasting Using Cloud Dynamics Features Extracted From Ground-Based Infrared Sky Images, Guillermo Terrén-Serrano

Electrical and Computer Engineering ETDs

Due to the increasing use of photovoltaic systems, power grids are vulnerable to the projection of shadows from moving clouds. An intra-hour solar forecast provides power grids with the capability of automatically controlling the dispatch of energy, reducing the additional cost for a guaranteed, reliable supply of energy (i.e., energy storage). This dissertation introduces a novel sky imager consisting of a long-wave radiometric infrared camera and a visible light camera with a fisheye lens. The imager is mounted on a solar tracker to maintain the Sun in the center of the images throughout the day, reducing the scattering effect produced …


A New Application Of The Central Limit Theorem, Kenneth Winters Apr 2022

A New Application Of The Central Limit Theorem, Kenneth Winters

Selected Honors Theses

This paper discusses the Central Limit Theorem (CLT) and its applications. The paper gives an introduction to what the CLT is and how it can be applied to real life. Additionally, the paper gives a conceptual understanding of the theorem through various examples and visuals. The paper discusses the applications of the CLT in fields such as computer science, psychology, and political science. The author then suggests a new mathematical theorem as an application of the CLT and provides a proof of the theorem. The new theorem relates to expected value and probabilities of random variables and provides a link …


Quadratic Neural Network Architecture As Evaluated Relative To Conventional Neural Network Architecture, Reid Taylor Apr 2022

Quadratic Neural Network Architecture As Evaluated Relative To Conventional Neural Network Architecture, Reid Taylor

Senior Theses

Current work in the field of deep learning and neural networks revolves around several variations of the same mathematical model for associative learning. These variations, while significant and exceptionally applicable in the real world, fail to push the limits of modern computational prowess. This research does just that: by leveraging high order tensors in place of 2nd order tensors, quadratic neural networks can be developed and can allow for substantially more complex machine learning models which allow for self-interactions of collected and analyzed data. This research shows the theorization and development of mathematical model necessary for such an idea to …


Session 5: Equipment Finance Credit Risk Modeling - A Case Study In Creative Model Development & Nimble Data Engineering, Edward Krueger, Landon Thompson, Josh Moore Feb 2022

Session 5: Equipment Finance Credit Risk Modeling - A Case Study In Creative Model Development & Nimble Data Engineering, Edward Krueger, Landon Thompson, Josh Moore

SDSU Data Science Symposium

This presentation will focus first on providing an overview of Channel and the Risk Analytics team that performed this case study. Given that context, we’ll then dive into our approach for building the modeling development data set, techniques and tools used to develop and implement the model into a production environment, and some of the challenges faced upon launch. Then, the presentation will pivot to the data engineering pipeline. During this portion, we will explore the application process and what happens to the data we collect. This will include how we extract & store the data along with how it …


A Predictive Model To Predict Cyberattack Using Self-Normalizing Neural Networks, Oluwapelumi Eniodunmo Jan 2022

A Predictive Model To Predict Cyberattack Using Self-Normalizing Neural Networks, Oluwapelumi Eniodunmo

Theses, Dissertations and Capstones

Cyberattack is a never-ending war that has greatly threatened secured information systems. The development of automated and intelligent systems provides more computing power to hackers to steal information, destroy data or system resources, and has raised global security issues. Statistical and Data mining tools have received continuous research and improvements. These tools have been adopted to create sophisticated intrusion detection systems that help information systems mitigate and defend against cyberattacks. However, the advancement in technology and accessibility of information makes more identifiable elements that can be used to gain unauthorized access to systems and resources. Data mining and classification tools …


Robust Testing Of Paired Outcomes Incorporating Covariate Effects In Clustered Data With Informative Cluster Size, Sandipan Dutta Jan 2022

Robust Testing Of Paired Outcomes Incorporating Covariate Effects In Clustered Data With Informative Cluster Size, Sandipan Dutta

Mathematics & Statistics Faculty Publications

Paired outcomes are common in correlated clustered data where the main aim is to compare the distributions of the outcomes in a pair. In such clustered paired data, informative cluster sizes can occur when the number of pairs in a cluster (i.e., a cluster size) is correlated to the paired outcomes or the paired differences. There have been some attempts to develop robust rank-based tests for comparing paired outcomes in such complex clustered data. Most of these existing rank tests developed for paired outcomes in clustered data compare the marginal distributions in a pair and ignore any covariate effect on …


Does Bias Have Shape? An Examination Of The Feasibility Of Algorithmic Detection Of Unfair Bias Using Topological Data Analysis, Ansel Steven Tessier Jan 2022

Does Bias Have Shape? An Examination Of The Feasibility Of Algorithmic Detection Of Unfair Bias Using Topological Data Analysis, Ansel Steven Tessier

Senior Projects Spring 2022

Artificial intelligence and machine learning systems are becoming ever more prevalent; at every turn these systems are asked to make decisions that have lasting impacts on peoples’ lives. It is becoming increasingly important that we ensure these systems are making fair and equitable decisions. For decades we have been aware of biased and unfair decision making in many sectors of society. In recent years a growing body of evidence suggests these biases are being captured in data that are then used to build artificial intelligence and machine learning systems, which themselves perpetuate these biases. The question is then, can we …


Finding The Best Predictors For Foot Traffic In Us Seafood Restaurants, Isabel Paige Beaulieu Jan 2022

Finding The Best Predictors For Foot Traffic In Us Seafood Restaurants, Isabel Paige Beaulieu

Honors Theses and Capstones

COVID-19 caused state and nation-wide lockdowns, which altered human foot traffic, especially in restaurants. The seafood sector in particular suffered greatly as there was an increase in illegal fishing, it is made up of perishable goods, it is seasonal in some places, and imports and exports were slowed. Foot traffic data is useful for business owners to have to know how much to order, how many employees to schedule, etc. One issue is that the data is very expensive, hard to get, and not available until months after it is recorded. Our goal is to not only find covariates that …


Realtime Event Detection In Sports Sensor Data With Machine Learning, Mallory Cashman Jan 2022

Realtime Event Detection In Sports Sensor Data With Machine Learning, Mallory Cashman

Honors Theses and Capstones

Machine learning models can be trained to classify time series based sports motion data, without reliance on assumptions about the capabilities of the users or sensors. This can be applied to predict the count of occurrences of an event in a time period. The experiment for this research uses lacrosse data, collected in partnership with SPAITR - a UNH undergraduate startup developing motion tracking devices for lacrosse. Decision Tree and Support Vector Machine (SVM) models are trained and perform with high success rates. These models improve upon previous work in human motion event detection and can be used a reference …


Exo-Sir: An Epidemiological Model To Analyze The Impact Of Exogenous Spread Of Infection, Nirmal Kumar Sivaraman, Manas Gaur, Shivansh Baijal, Sakthi Balan Muthiah, Amit Sheth Jan 2022

Exo-Sir: An Epidemiological Model To Analyze The Impact Of Exogenous Spread Of Infection, Nirmal Kumar Sivaraman, Manas Gaur, Shivansh Baijal, Sakthi Balan Muthiah, Amit Sheth

Publications

Epidemics like Covid-19 and Ebola have impacted people's lives significantly. The impact of mobility of people across the countries or states in the spread of epidemics has been significant. The spread of disease due to factors local to the population under consideration is termed the endogenous spread. The spread due to external factors like migration, mobility, etc. is called the exogenous spread. In this paper, we introduce the Exo-SIR model, an extension of the popular SIR model and a few variants of the model. The novelty in our model is that it captures both the exogenous and endogenous spread of …


Batch Normalization Preconditioning For Neural Network Training, Susanna Luisa Gertrude Lange Jan 2022

Batch Normalization Preconditioning For Neural Network Training, Susanna Luisa Gertrude Lange

Theses and Dissertations--Mathematics

Batch normalization (BN) is a popular and ubiquitous method in deep learning that has been shown to decrease training time and improve generalization performance of neural networks. Despite its success, BN is not theoretically well understood. It is not suitable for use with very small mini-batch sizes or online learning. In this work, we propose a new method called Batch Normalization Preconditioning (BNP). Instead of applying normalization explicitly through a batch normalization layer as is done in BN, BNP applies normalization by conditioning the parameter gradients directly during training. This is designed to improve the Hessian matrix of the loss …