Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 72

Full-Text Articles in Physical Sciences and Mathematics

Crosshair Optimizer, Jason Torrence Jan 2023

Crosshair Optimizer, Jason Torrence

All Master's Theses

Metaheuristic optimization algorithms are heuristics that are capable of creating a "good enough'' solution to a computationally complex problem. Algorithms in this area of study are focused on the process of exploration and exploitation: exploration of the solution space and exploitation of the results that have been found during that exploration, with most resources going toward the former half of the process. The novel Crosshair optimizer developed in this thesis seeks to take advantage of the latter, exploiting the best possible result as much as possible by directly searching the area around that best result with a stochastic approach. This …


Weighted Incremental–Decremental Support Vector Machines For Concept Drift With Shifting Window, Honorius Gâlmeanu, Răzvan Andonie Aug 2022

Weighted Incremental–Decremental Support Vector Machines For Concept Drift With Shifting Window, Honorius Gâlmeanu, Răzvan Andonie

Computer Science Faculty Scholarship

We study the problem of learning the data samples’ distribution as it changes in time. This change, known as concept drift, complicates the task of training a model, as the predictions become less and less accurate. It is known that Support Vector Machines (SVMs) can learn weighted input instances and that they can also be trained online (incremental–decremental learning). Combining these two SVM properties, the open problem is to define an online SVM concept drift model with shifting weighted window. The classic SVM model should be retrained from scratch after each window shift. We introduce the Weighted Incremental–Decremental SVM (WIDSVM), …


Information Bottleneck In Deep Learning - A Semiotic Approach, Bogdan Musat, Razvan Andonie Jan 2022

Information Bottleneck In Deep Learning - A Semiotic Approach, Bogdan Musat, Razvan Andonie

Computer Science Faculty Scholarship

The information bottleneck principle was recently proposed as a theory meant to explain some of the training dynamics of deep neural architectures. Via information plane analysis, patterns start to emerge in this framework, where two phases can be distinguished: fitting and compression. We take a step further and study the behaviour of the spatial entropy characterizing the layers of convolutional neural networks (CNNs), in relation to the information bottleneck theory. We observe pattern formations which resemble the information bottleneck fitting and compression phases. From the perspective of semiotics, also known as the study of signs and sign-using behavior, the saliency …


Interpretable Machine Learning For Self-Service High-Risk Decision Making, Charles Recaido Jan 2022

Interpretable Machine Learning For Self-Service High-Risk Decision Making, Charles Recaido

All Master's Theses

This research contributes to interpretable machine learning via visual knowledge discovery in General Line Coordinates (GLC). The concepts of hyperblocks as interpretable dataset units and GLC are combined to create a visual self-service machine learning model. Two variants of GLC known as Dynamic Scaffold Coordinates (DSC) are proposed. DSC1 and DSC2 can map in a lossless manner multiple dataset attributes to a single two-dimensional (X, Y) Cartesian plane using a dynamic scaffolding graph construction algorithm.

Hyperblock analysis is used to determine visually appealing dataset attribute orders and to reduce line occlusion. It is shown that hyperblocks can generalize decision tree …


Concept Drift Adaptation With Incremental–Decremental Svm, Honorius Gâlmeanu, Răzvan Andonie Oct 2021

Concept Drift Adaptation With Incremental–Decremental Svm, Honorius Gâlmeanu, Răzvan Andonie

Computer Science Faculty Scholarship

Data classification in streams where the underlying distribution changes over time is known to be difficult. This problem—known as concept drift detection—involves two aspects: (i) detecting the concept drift and (ii) adapting the classifier. Online training only considers the most recent samples; they form the so-called shifting window. Dynamic adaptation to concept drift is performed by varying the width of the window. Defining an online Support Vector Machine (SVM) classifier able to cope with concept drift by dynamically changing the window size and avoiding retraining from scratch is currently an open problem. We introduce the Adaptive Incremental–Decremental SVM (AIDSVM), a …


Learning In Convolutional Neural Networks Accelerated By Transfer Entropy, Adrian Moldovan, Angel Caţaron, Răzvan Andonie Sep 2021

Learning In Convolutional Neural Networks Accelerated By Transfer Entropy, Adrian Moldovan, Angel Caţaron, Răzvan Andonie

Computer Science Faculty Scholarship

Recently, there is a growing interest in applying Transfer Entropy (TE) in quantifying the effective connectivity between artificial neurons. In a feedforward network, the TE can be used to quantify the relationships between neuron output pairs located in different layers. Our focus is on how to include the TE in the learning mechanisms of a Convolutional Neural Network (CNN) architecture. We introduce a novel training mechanism for CNN architectures which integrates the TE feedback connections. Adding the TE feedback parameter accelerates the training process, as fewer epochs are needed. On the flip side, it adds computational overhead to each epoch. …


Energy Optimization In Multi-Uav-Assisted Edge Data Collection System, Bin Xu, Lu Zhang, Zipeng Xu, Yichuan Liu, Jinming Chai, Sichong Qin, Yanfei Sun Jul 2021

Energy Optimization In Multi-Uav-Assisted Edge Data Collection System, Bin Xu, Lu Zhang, Zipeng Xu, Yichuan Liu, Jinming Chai, Sichong Qin, Yanfei Sun

Student Published Works

In the IoT (Internet of Things) system, the introduction of UAV (Unmanned Aerial Vehicle) as a new data collection platform can solve the problem that IoT devices are unable to transmit data over long distances due to the limitation of their battery energy. However, the unreasonable distribution of UAVs will still lead to the problem of the high total energy consumption of the system. In this work, to deal with the problem, a deployment model of a mobile edge computing (MEC) system based on multi-UAV is proposed. The goal of the model is to minimize the energy consumption of the …


Interactive Visual Self-Service Data Classification Approach To Democratize Machine Learning, Sridevi Narayana Wagle Jan 2021

Interactive Visual Self-Service Data Classification Approach To Democratize Machine Learning, Sridevi Narayana Wagle

All Master's Theses

Machine learning algorithms often produce models considered as complex black-box models by both end users and developers. Such algorithms fail to explain the model in terms of the domain they are designed for. The proposed Iterative Visual Logical Classifier (IVLC) is an interpretable machine learning algorithm that allows end users to design a model and classify data with more confidence and without having to compromise on the accuracy. Such technique is especially helpful when dealing with sensitive and crucial data like cancer data in the medical domain with high cost of errors. With the help of the proposed interactive and …


Visualization For Solving Non-Image Problems And Saliency Mapping, Divya Chandrika Kalla Jan 2021

Visualization For Solving Non-Image Problems And Saliency Mapping, Divya Chandrika Kalla

All Master's Theses

High-dimensional data play an important role in knowledge discovery and data science. Integration of visualization, visual analytics, machine learning (ML), and data mining (DM) are the key aspects of data science research for high-dimensional data. This thesis is to explore the efficiency of a new algorithm to convert non-images data into raster images by visualizing data using heatmap in the collocated paired coordinates (CPC). These images are called the CPC-R images and the algorithm that produces them is called the CPC-R algorithm. Powerful deep learning methods open an opportunity to solve non-image ML/DM problems by transforming non-image ML problems into …


Semiotic Aggregation In Deep Learning, Bogdan Muşat, Răzvan Andonie Dec 2020

Semiotic Aggregation In Deep Learning, Bogdan Muşat, Răzvan Andonie

All Faculty Scholarship for the College of the Sciences

Convolutional neural networks utilize a hierarchy of neural network layers. The statistical aspects of information concentration in successive layers can bring an insight into the feature abstraction process. We analyze the saliency maps of these layers from the perspective of semiotics, also known as the study of signs and sign-using behavior. In computational semiotics, this aggregation operation (known as superization) is accompanied by a decrease of spatial entropy: signs are aggregated into supersign. Using spatial entropy, we compute the information content of the saliency maps and study the superization processes which take place between successive layers of the network. In …


Modeling Multi-Targets Sentiment Classification Via Graph Convolutional Networks And Auxiliary Relation, Ao Feng, Zhengjie Gao, Xinyu Song, Ke Ke, Tianhao Xu, Xuelei Zhang Jun 2020

Modeling Multi-Targets Sentiment Classification Via Graph Convolutional Networks And Auxiliary Relation, Ao Feng, Zhengjie Gao, Xinyu Song, Ke Ke, Tianhao Xu, Xuelei Zhang

All Faculty Scholarship for the College of the Sciences

Existing solutions do not work well when multi-targets coexist in a sentence. The reason is that the existing solution is usually to separate multiple targets and process them separately. If the original sentence has N target, the original sentence will be repeated for N times, and only one target will be processed each time. To some extent, this approach degenerates the fine-grained sentiment classification task into the sentencelevel sentiment classification task, and the research method of processing the target separately ignores the internal relation and interaction between the targets. Based on the above considerations, we proposes to use Graph Convolutional …


Weighted Random Search For Cnn Hyperparameter Optimization, Rǎzvan Andonie, Adrian-Cǎtǎlin Florea Apr 2020

Weighted Random Search For Cnn Hyperparameter Optimization, Rǎzvan Andonie, Adrian-Cǎtǎlin Florea

All Faculty Scholarship for the College of the Sciences

Nearly all model algorithms used in machine learning use two different sets of parameters: the training parameters and the meta-parameters (hyperparameters). While the training parameters are learned during the training phase, the values of the hyperparameters have to be specified before learning starts. For a given dataset, we would like to find the optimal combination of hyperparameter values, in a reasonable amount of time. This is a challenging task because of its computational complexity. In previous work, we introduced the Weighted Random Search (WRS) method, a combination of Random Search (RS) and probabilistic greedy heuristic. In the current paper, we …


Learning In Feedforward Neural Networks Accelerated By Transfer Entropy, Adrian Moldovan, Angel Caţaron, Rǎzvan Andonie Jan 2020

Learning In Feedforward Neural Networks Accelerated By Transfer Entropy, Adrian Moldovan, Angel Caţaron, Rǎzvan Andonie

All Faculty Scholarship for the College of the Sciences

Current neural networks architectures are many times harder to train because of the increasing size and complexity of the used datasets. Our objective is to design more efficient training algorithms utilizing causal relationships inferred from neural networks. The transfer entropy (TE) was initially introduced as an information transfer measure used to quantify the statistical coherence between events (time series). Later, it was related to causality, even if they are not the same. There are only few papers reporting applications of causality or TE in neural networks. Our contribution is an information-theoretical method for analyzing information transfer between the nodes of …


Using Cuda To Enhance Data Processing Of Variant Call Format Files For Statistical Genetic Analysis, Heather Mckinnon Jan 2020

Using Cuda To Enhance Data Processing Of Variant Call Format Files For Statistical Genetic Analysis, Heather Mckinnon

All Graduate Projects

Utilizing the power of GPU parallel processing with CUDA can speed up the processing of Variant Call Format (VCF) files and statistical analysis of genomic data. A software package designed toward this purpose would be beneficial to genetic researchers by saving them time which they could spend on other aspects of their research. A data set containing genetics from a study of trichome production in Mimulus guttatus, or yellow monkey flower, was used to develop a package to test the effectiveness of GPU parallel processing versus serial executions. After a serial version of the code was generated and benchmarked, OpenACC …


Optimizing Pollution Routing Problem, Shivika Dewan Jan 2020

Optimizing Pollution Routing Problem, Shivika Dewan

All Master's Theses

Pollution is a major environmental issue around the world. Despite the growing use and impact of commercial vehicles, recent research has been conducted with minimizing pollution as the primary objective to be reduced. The objective of this project is to implement different optimization algorithms to solve this problem. A basic model is created using the Vehicle Routing Problem (VRP) which is further extended to the Pollution Routing Problem (PRP). The basic model is updated using a Monte Carlo Algorithm (MCA). The data set contains 180 data files with a combination of 10, 15, 20, 25, 50, 75, 100, 150, and …


Image Features For Tuberculosis Classification In Digital Chest Radiographs, Brian Hooper Jan 2020

Image Features For Tuberculosis Classification In Digital Chest Radiographs, Brian Hooper

All Master's Theses

Tuberculosis (TB) is a respiratory disease which affects millions of people each year, accounting for the tenth leading cause of death worldwide, and is especially prevalent in underdeveloped regions where access to adequate medical care may be limited. Analysis of digital chest radiographs (CXRs) is a common and inexpensive method for the diagnosis of TB; however, a trained radiologist is required to interpret the results, and is subject to human error. Computer-Aided Detection (CAD) systems are a promising machine-learning based solution to automate the diagnosis of TB from CXR images. As the dimensionality of a high-resolution CXR image is very …


Toward Efficient Automation Of Interpretable Machine Learning Boosting, Nathan Neuhaus Jan 2020

Toward Efficient Automation Of Interpretable Machine Learning Boosting, Nathan Neuhaus

All Master's Theses

Developing efficient automated methods for Interpretable Machine Learning (IML) is an important and long-term goal in the field of Artificial Intelligence. Currently the Machine Learning landscape is dominated by Neural Networks (NNs) and Support Vector Machines (SVMs), models which are often highly accurate. Despite high accuracy, such models are essentially “black boxes” and therefore are too risky for situations like healthcare where real lives are at stake. In such situations, so called “glass-box” models, such as Decision Trees (DTs), Bayesian Networks (BNs), and Logic Relational (LR) models are often preferred, however can succumb to accuracy limitations. Unfortunately, having to choose …


Automated Morgan Keenan Classification Of Observed Stellar Spectra Collected By The Sloan Digital Sky Survey Using A Single Classifier, Michael J. Brice, Răzvan Andonie Oct 2019

Automated Morgan Keenan Classification Of Observed Stellar Spectra Collected By The Sloan Digital Sky Survey Using A Single Classifier, Michael J. Brice, Răzvan Andonie

All Faculty Scholarship for the College of the Sciences

The classification of stellar spectra is a fundamental task in stellar astrophysics. Stellar spectra from the Sloan Digital Sky Survey are applied to standard classification methods, k-nearest neighbors and random forest, to automatically classify the spectra. Stellar spectra are high dimensional data and the dimensionality is reduced using astronomical knowledge because classifiers work in low dimensional space. These methods are utilized to classify the stellar spectra into a complete Morgan Keenan classification (spectral and luminosity) using a single classifier. The motion of stars (radial velocity) causes machine-learning complications through the feature matrix when classifying stellar spectra. Due to the nature …


Weighted Random Search For Hyperparameter Optimization, Adrian-Cǎtǎlin Florea, Rǎzvan Andonie Apr 2019

Weighted Random Search For Hyperparameter Optimization, Adrian-Cǎtǎlin Florea, Rǎzvan Andonie

All Faculty Scholarship for the College of the Sciences

We introduce an improved version of Random Search (RS), used here for hyperparameter optimization of machine learning algorithms. Unlike the standard RS, which generates for each trial new values for all hyperparameters, we generate new values for each hyperparameter with a probability of change. The intuition behind our approach is that a value that already triggered a good result is a good candidate for the next step, and should be tested in new combinations of hyperparameter values. Within the same computational budget, our method yields better results than the standard RS. Our theoretical results prove this statement. We test our …


Classification Of Stars From Redshifted Stellar Spectra Utilizing Machine Learning, Michael J. Brice Jan 2019

Classification Of Stars From Redshifted Stellar Spectra Utilizing Machine Learning, Michael J. Brice

All Master's Theses

The classification of stellar spectra is a fundamental task in stellar astrophysics. There have been many explorations into the automated classification of stellar spectra but few that involve the Sloan Digital Sky Survey (SDSS). Stellar spectra from the SDSS are applied to standard classification methods such as K-Nearest Neighbors, Random Forest, and Support Vector Machine to automatically classify the spectra. Stellar spectra are high dimensional data and the dimensionality is reduced using standard Feature Selection methods such as Chi-Squared and Fisher score and with domain-specific astronomical knowledge because classifiers work in low dimensional space. These methods are utilized to classify …


Automatic Classification And Shift Detection Of Facial Expressions In Event-Aware Smart Environments, Arne Bernin, Larissa Müller, Sobin Ghose, Christos Grecos, Qi Wang, Ralf Jettke, Kai Von Luck, Florian Vogt Jun 2018

Automatic Classification And Shift Detection Of Facial Expressions In Event-Aware Smart Environments, Arne Bernin, Larissa Müller, Sobin Ghose, Christos Grecos, Qi Wang, Ralf Jettke, Kai Von Luck, Florian Vogt

All Faculty Scholarship for the College of the Sciences

Affective application developers often face a challenge in integrating the output of facial expression recognition (FER) software in interactive systems: although many algorithms have been proposed for FER, integrating the results of these algorithms into applications remains difficult. Due to inter- and within-subject variations further post-processing is needed. Our work addresses this problem by introducing and comparing three post-processing classification algorithms for FER output applied to an event-based interaction scheme to pinpoint the affective context within a time window. Our comparison is based on earlier published experiments with an interactive cycling simulation in which participants were provoked with game elements …


Transfer Information Energy: A Quantitative Indicator Of Information Transfer Between Time Series, Angel Caƫaron, Rǎzvan Andonie Apr 2018

Transfer Information Energy: A Quantitative Indicator Of Information Transfer Between Time Series, Angel Caƫaron, Rǎzvan Andonie

All Faculty Scholarship for the College of the Sciences

We introduce an information-theoretical approach for analyzing information transfer between time series. Rather than using the Transfer Entropy (TE), we define and apply the Transfer Information Energy (TIE), which is based on Onicescu’s Information Energy. Whereas the TE can be used as a measure of the reduction in uncertainty about one time series given another, the TIE may be viewed as a measure of the increase in certainty about one time series given another. We compare the TIE and the TE in two known time series prediction applications. First, we analyze stock market indexes from the Americas, Asia/Pacific and Europe, …


Retrospective Analysis And Prediction: Artificial Intelligence And Its Applications In Libraries, Ping Fu Mar 2018

Retrospective Analysis And Prediction: Artificial Intelligence And Its Applications In Libraries, Ping Fu

Library Scholarship

The application of Artificial Intelligence (AI) has brought significant innovation to fundamental science and research in recent years. This paper briefly reviews and analyzes the findings of research and development of AI technologies such as expert systems, natural language processing, pattern recognition, robotics and machine learning in the fields of library such as information retrieval, reference service, cataloging, classification, acquisitions, circulation and automation. By reviewing and analyzing research papers published on respected academic journals, studying the examples and practical cases of the latest AI applications in industry, this study finds that current AI applications in the field of library are …


Deep Learning Of 2-D Images Representing N-D Data In General Line Coordinates, Dmytro Dovhalets, Boris Kovalerchuk, Szilárd Vajda, Răzvan Andonie Jan 2018

Deep Learning Of 2-D Images Representing N-D Data In General Line Coordinates, Dmytro Dovhalets, Boris Kovalerchuk, Szilárd Vajda, Răzvan Andonie

Computer Science Faculty Scholarship

While knowledge discovery and n-D data visualization procedures are often efficient, the loss of information, occlusion, and clutter continue to be a challenge. General Line Coordinates (GLC) is a rather new technique to deal with such artifacts. GLC-Linear, which is one of the methods in GLC, allows transforming n-D numerical data to their visual representation as polylines losslessly. The method proposed in this paper uses these 2-D visual representations as input to Convolutional Neural Network (CNN) classifiers. The obtained classification accuracies are close to the ones obtained by other machine learning algorithms. The main benefit of the method is the …


Looking At Faces In The Wild, Eugene Borovikov, Szilárd Vajda, Michael Bonifant, Michael Gill Jan 2018

Looking At Faces In The Wild, Eugene Borovikov, Szilárd Vajda, Michael Bonifant, Michael Gill

Computer Science Faculty Scholarship

Recent advances in the face detection (FD) and recognition (FR) technology may give an impression that the problem of face matching is essentially solved, e.g. via deep learning models using thousands of samples per face for training and validation on the available benchmark data-sets. Human vision system seems to handle face localization and matching problem differently from the modern FR systems, since humans detect faces instantly even in most cluttered environments, and often require a single view of a face to reliably distinguish it from all others. This prompted us to take a biologically inspired look at building a cognitive …


Data Visualization And Classification Of Artificially Created Images, Dmytro Dovhalets Jan 2018

Data Visualization And Classification Of Artificially Created Images, Dmytro Dovhalets

All Master's Theses

Visualization of multidimensional data is a long-standing challenge in machine learning and knowledge discovery. A problem arises as soon as 4-dimensions are introduced since we live in a 3-dimensional world. There are methods out there which can visualize multidimensional data, but loss of information and clutter are still a problem. General Line Coordinates (GLC) can losslessly project n-dimensional data in 2- dimensions. A new method is introduced based on GLC called GLC-L. This new method can do interactive visualization, dimension reduction, and supervised learning. One of the applications of GLC-L is transformation of vector data into image data. This novel …


Decreasing Occlusion And Increasing Explanation In Interactive Visual Knowledge Discovery, Abdulrahman Ahmed Gharawi Jan 2018

Decreasing Occlusion And Increasing Explanation In Interactive Visual Knowledge Discovery, Abdulrahman Ahmed Gharawi

All Master's Theses

Lack of explanation and occlusion are the major problems for interactive visual knowledge discovery, machine learning and data mining in multidimensional data. This thesis proposes a hybrid method that combines visual and analytical means to deal with these problems. This method, denoted as FSP, uses visualization of n-D data in 2-D in a set of Shifted Paired Coordinates (SPC). SPC for n-D data consists of n/2 pairs of Cartesian coordinates that are shifted relative to each other to avoid their overlap. Each n-D point is represented as a directed graph in SPC. It is shown that the FSP method simplifies …


Spike-Based Classification Of Uci Datasets With Multi-Layer Resume-Like Tempotron, Sami Abdul-Wahid Jan 2018

Spike-Based Classification Of Uci Datasets With Multi-Layer Resume-Like Tempotron, Sami Abdul-Wahid

All Master's Theses

Spiking neurons are a class of neuron models that represent information in timed sequences called ``spikes.'' Though predominantly used in neuro-scientific investigations, spiking neural networks (SNN) can be applied to machine learning problems such as classification and regression. SNN are computationally more powerful per neuron than traditional neural networks. Though training time is slow on general purpose computers, spike-based hardware implementations are faster and have shown capability for ultra-low power consumption. Additionally, various SNN training algorithms have achieved comparable performance with the State of the Art on the Fisher Iris dataset. Our main contribution is a software implementation of the …


Asymptotically Unbiased Estimation Of A Nonsymmetric Dependence Measure Applied To Sensor Data Analytics And Financial Time Series, Angel Caƫaron, Razvan Andonie, Yvonne Chueh Aug 2017

Asymptotically Unbiased Estimation Of A Nonsymmetric Dependence Measure Applied To Sensor Data Analytics And Financial Time Series, Angel Caƫaron, Razvan Andonie, Yvonne Chueh

All Faculty Scholarship for the College of the Sciences

A fundamental concept frequently applied to statistical machine learning is the detection of dependencies between unknown random variables found from data samples. In previous work, we have introduced a nonparametric unilateral dependence measure based on Onicescu’s information energy and a kNN method for estimating this measure from an available sample set of discrete or continuous variables. This paper provides the formal proofs which show that the estimator is asymptotically unbiased and has asymptotic zero variance when the sample size increases. It implies that the estimator has good statistical qualities. We investigate the performance of the estimator for data analysis applications …


Constructing Interactive Visual Classification, Clustering And Dimension Reduction Models For N-D Data, Boris Kovalerchuk, Dmytro Dovhalets Jul 2017

Constructing Interactive Visual Classification, Clustering And Dimension Reduction Models For N-D Data, Boris Kovalerchuk, Dmytro Dovhalets

Computer Science Faculty Scholarship

The exploration of multidimensional datasets of all possible sizes and dimensions is a long-standing challenge in knowledge discovery, machine learning, and visualization. While multiple efficient visualization methods for n-D data analysis exist, the loss of information, occlusion, and clutter continue to be a challenge. This paper proposes and explores a new interactive method for visual discovery of n-D relations for supervised learning. The method includes automatic, interactive, and combined algorithms for discovering linear relations, dimension reduction, and generalization for non-linear relations. This method is a special category of reversible General Line Coordinates (GLC). It produces graphs in 2-D that represent …