Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 20 of 20

Full-Text Articles in Physical Sciences and Mathematics

Crosshair Optimizer, Jason Torrence Jan 2023

Crosshair Optimizer, Jason Torrence

All Master's Theses

Metaheuristic optimization algorithms are heuristics that are capable of creating a "good enough'' solution to a computationally complex problem. Algorithms in this area of study are focused on the process of exploration and exploitation: exploration of the solution space and exploitation of the results that have been found during that exploration, with most resources going toward the former half of the process. The novel Crosshair optimizer developed in this thesis seeks to take advantage of the latter, exploiting the best possible result as much as possible by directly searching the area around that best result with a stochastic approach. This …


Interpretable Machine Learning For Self-Service High-Risk Decision Making, Charles Recaido Jan 2022

Interpretable Machine Learning For Self-Service High-Risk Decision Making, Charles Recaido

All Master's Theses

This research contributes to interpretable machine learning via visual knowledge discovery in General Line Coordinates (GLC). The concepts of hyperblocks as interpretable dataset units and GLC are combined to create a visual self-service machine learning model. Two variants of GLC known as Dynamic Scaffold Coordinates (DSC) are proposed. DSC1 and DSC2 can map in a lossless manner multiple dataset attributes to a single two-dimensional (X, Y) Cartesian plane using a dynamic scaffolding graph construction algorithm.

Hyperblock analysis is used to determine visually appealing dataset attribute orders and to reduce line occlusion. It is shown that hyperblocks can generalize decision tree …


Interactive Visual Self-Service Data Classification Approach To Democratize Machine Learning, Sridevi Narayana Wagle Jan 2021

Interactive Visual Self-Service Data Classification Approach To Democratize Machine Learning, Sridevi Narayana Wagle

All Master's Theses

Machine learning algorithms often produce models considered as complex black-box models by both end users and developers. Such algorithms fail to explain the model in terms of the domain they are designed for. The proposed Iterative Visual Logical Classifier (IVLC) is an interpretable machine learning algorithm that allows end users to design a model and classify data with more confidence and without having to compromise on the accuracy. Such technique is especially helpful when dealing with sensitive and crucial data like cancer data in the medical domain with high cost of errors. With the help of the proposed interactive and …


Visualization For Solving Non-Image Problems And Saliency Mapping, Divya Chandrika Kalla Jan 2021

Visualization For Solving Non-Image Problems And Saliency Mapping, Divya Chandrika Kalla

All Master's Theses

High-dimensional data play an important role in knowledge discovery and data science. Integration of visualization, visual analytics, machine learning (ML), and data mining (DM) are the key aspects of data science research for high-dimensional data. This thesis is to explore the efficiency of a new algorithm to convert non-images data into raster images by visualizing data using heatmap in the collocated paired coordinates (CPC). These images are called the CPC-R images and the algorithm that produces them is called the CPC-R algorithm. Powerful deep learning methods open an opportunity to solve non-image ML/DM problems by transforming non-image ML problems into …


Using Cuda To Enhance Data Processing Of Variant Call Format Files For Statistical Genetic Analysis, Heather Mckinnon Jan 2020

Using Cuda To Enhance Data Processing Of Variant Call Format Files For Statistical Genetic Analysis, Heather Mckinnon

All Graduate Projects

Utilizing the power of GPU parallel processing with CUDA can speed up the processing of Variant Call Format (VCF) files and statistical analysis of genomic data. A software package designed toward this purpose would be beneficial to genetic researchers by saving them time which they could spend on other aspects of their research. A data set containing genetics from a study of trichome production in Mimulus guttatus, or yellow monkey flower, was used to develop a package to test the effectiveness of GPU parallel processing versus serial executions. After a serial version of the code was generated and benchmarked, OpenACC …


Image Features For Tuberculosis Classification In Digital Chest Radiographs, Brian Hooper Jan 2020

Image Features For Tuberculosis Classification In Digital Chest Radiographs, Brian Hooper

All Master's Theses

Tuberculosis (TB) is a respiratory disease which affects millions of people each year, accounting for the tenth leading cause of death worldwide, and is especially prevalent in underdeveloped regions where access to adequate medical care may be limited. Analysis of digital chest radiographs (CXRs) is a common and inexpensive method for the diagnosis of TB; however, a trained radiologist is required to interpret the results, and is subject to human error. Computer-Aided Detection (CAD) systems are a promising machine-learning based solution to automate the diagnosis of TB from CXR images. As the dimensionality of a high-resolution CXR image is very …


Toward Efficient Automation Of Interpretable Machine Learning Boosting, Nathan Neuhaus Jan 2020

Toward Efficient Automation Of Interpretable Machine Learning Boosting, Nathan Neuhaus

All Master's Theses

Developing efficient automated methods for Interpretable Machine Learning (IML) is an important and long-term goal in the field of Artificial Intelligence. Currently the Machine Learning landscape is dominated by Neural Networks (NNs) and Support Vector Machines (SVMs), models which are often highly accurate. Despite high accuracy, such models are essentially “black boxes” and therefore are too risky for situations like healthcare where real lives are at stake. In such situations, so called “glass-box” models, such as Decision Trees (DTs), Bayesian Networks (BNs), and Logic Relational (LR) models are often preferred, however can succumb to accuracy limitations. Unfortunately, having to choose …


Optimizing Pollution Routing Problem, Shivika Dewan Jan 2020

Optimizing Pollution Routing Problem, Shivika Dewan

All Master's Theses

Pollution is a major environmental issue around the world. Despite the growing use and impact of commercial vehicles, recent research has been conducted with minimizing pollution as the primary objective to be reduced. The objective of this project is to implement different optimization algorithms to solve this problem. A basic model is created using the Vehicle Routing Problem (VRP) which is further extended to the Pollution Routing Problem (PRP). The basic model is updated using a Monte Carlo Algorithm (MCA). The data set contains 180 data files with a combination of 10, 15, 20, 25, 50, 75, 100, 150, and …


Classification Of Stars From Redshifted Stellar Spectra Utilizing Machine Learning, Michael J. Brice Jan 2019

Classification Of Stars From Redshifted Stellar Spectra Utilizing Machine Learning, Michael J. Brice

All Master's Theses

The classification of stellar spectra is a fundamental task in stellar astrophysics. There have been many explorations into the automated classification of stellar spectra but few that involve the Sloan Digital Sky Survey (SDSS). Stellar spectra from the SDSS are applied to standard classification methods such as K-Nearest Neighbors, Random Forest, and Support Vector Machine to automatically classify the spectra. Stellar spectra are high dimensional data and the dimensionality is reduced using standard Feature Selection methods such as Chi-Squared and Fisher score and with domain-specific astronomical knowledge because classifiers work in low dimensional space. These methods are utilized to classify …


Spike-Based Classification Of Uci Datasets With Multi-Layer Resume-Like Tempotron, Sami Abdul-Wahid Jan 2018

Spike-Based Classification Of Uci Datasets With Multi-Layer Resume-Like Tempotron, Sami Abdul-Wahid

All Master's Theses

Spiking neurons are a class of neuron models that represent information in timed sequences called ``spikes.'' Though predominantly used in neuro-scientific investigations, spiking neural networks (SNN) can be applied to machine learning problems such as classification and regression. SNN are computationally more powerful per neuron than traditional neural networks. Though training time is slow on general purpose computers, spike-based hardware implementations are faster and have shown capability for ultra-low power consumption. Additionally, various SNN training algorithms have achieved comparable performance with the State of the Art on the Fisher Iris dataset. Our main contribution is a software implementation of the …


Data Visualization And Classification Of Artificially Created Images, Dmytro Dovhalets Jan 2018

Data Visualization And Classification Of Artificially Created Images, Dmytro Dovhalets

All Master's Theses

Visualization of multidimensional data is a long-standing challenge in machine learning and knowledge discovery. A problem arises as soon as 4-dimensions are introduced since we live in a 3-dimensional world. There are methods out there which can visualize multidimensional data, but loss of information and clutter are still a problem. General Line Coordinates (GLC) can losslessly project n-dimensional data in 2- dimensions. A new method is introduced based on GLC called GLC-L. This new method can do interactive visualization, dimension reduction, and supervised learning. One of the applications of GLC-L is transformation of vector data into image data. This novel …


Decreasing Occlusion And Increasing Explanation In Interactive Visual Knowledge Discovery, Abdulrahman Ahmed Gharawi Jan 2018

Decreasing Occlusion And Increasing Explanation In Interactive Visual Knowledge Discovery, Abdulrahman Ahmed Gharawi

All Master's Theses

Lack of explanation and occlusion are the major problems for interactive visual knowledge discovery, machine learning and data mining in multidimensional data. This thesis proposes a hybrid method that combines visual and analytical means to deal with these problems. This method, denoted as FSP, uses visualization of n-D data in 2-D in a set of Shifted Paired Coordinates (SPC). SPC for n-D data consists of n/2 pairs of Cartesian coordinates that are shifted relative to each other to avoid their overlap. Each n-D point is represented as a directed graph in SPC. It is shown that the FSP method simplifies …


Visualizing Incongruity: Visual Data Mining Strategies For Modeling Humor In Text, Andrew Smigaj Jan 2017

Visualizing Incongruity: Visual Data Mining Strategies For Modeling Humor In Text, Andrew Smigaj

All Graduate Projects

The goal of this project is to investigate the use of visual data mining to model verbal humor. We explored various means of text visualization to identify key featrues of garden path jokes as compared with non jokes. With garden path jokes one interpretation is established in the setup but new information indicating some alternative interpretation triggers some resolution process leading to a new interpretation. For this project we visualize text in three novel ways, assisted by some web mining to build an informal ontology, that allow us to see the differences between garden path jokes and non jokes of …


Visualizing Multidimensional Data With General Line Coordinates And Pareto Optimization, Jacob Brown Jan 2017

Visualizing Multidimensional Data With General Line Coordinates And Pareto Optimization, Jacob Brown

All Master's Theses

These results will show that the use of Linear General Line Coordinates (GLC-L) can visualize multidimensional data better than typical methods, such as Parallel Coordinates (PC). The results of using GLC-L will display visuals with less clutter than PC and be easier to see changes from one graph to the next. Visualizing the Pareto Frontier with GLC-L allows n-D data to be viewed at once, compared to typical methods that are limited to 2 or 3 objectives at a time. This method details the process of selecting a ”best” case, from a group of equals in the Pareto Subset and …


Applications Of Computational Geometry And Computer Vision, Joseph Lemley Jan 2016

Applications Of Computational Geometry And Computer Vision, Joseph Lemley

All Master's Theses

Recent advances in machine learning research promise to bring us closer to the original goals of artificial intelligence. Spurred by recent innovations in low-cost, specialized hardware and incremental refinements in machine learning algorithms, machine learning is revolutionizing entire industries. Perhaps the biggest beneficiary of this progress has been the field of computer vision. Within the domains of computational geometry and computer vision are two problems: Finding large, interesting holes in high dimensional data, and locating and automatically classifying facial features from images. State of the art methods for facial feature classification are compared and new methods for finding empty hyper-rectangles …


Intentional Recruiting: Using Business Intelligence, Data Mining, And Predictive Analytics To Identify Characteristics Of Those Students Who Enroll, And Graduate; In Support Of University Enrollment Management, Stephanie L. Harris Jan 2015

Intentional Recruiting: Using Business Intelligence, Data Mining, And Predictive Analytics To Identify Characteristics Of Those Students Who Enroll, And Graduate; In Support Of University Enrollment Management, Stephanie L. Harris

All Master's Theses

Using business intelligence (BI) and archival data from a division II, public comprehensive, university in Washington State, the researcher identified specific characteristics of those students who enrolled, persisted and completed to undergraduate degree attainment. These characteristics created an applicant profile to be used in future enrollment management activities for intentional recruiting, while the predictive models for enrollment and completion inform administration to improve tuition revenue planning and budgeting, and to forecast future enrollment yield.


Using Time Series Models For Defect Prediction In Software Release Planning, James W. Tunnell Jan 2015

Using Time Series Models For Defect Prediction In Software Release Planning, James W. Tunnell

All Master's Theses

To produce a high-quality software release, sufficient time should be allowed for testing and fixing defects. Otherwise, there is a risk of slip in the development schedule and/or software quality. A time series model is used to predict the number of bugs created during development. The model depends on the previous numbers of bugs created. The model also depends, in an exogenous manner, on the previous numbers of new features resolved and improvements resolved. This model structure would allow hypothetical release plans to be compared by assessing their predicted impact on testing and defect- fixing time. The VARX time series …


Development Of A Hypermedia Database For The Elementary Classroom, Jane Pattison Brown Jan 1993

Development Of A Hypermedia Database For The Elementary Classroom, Jane Pattison Brown

All Graduate Projects

A hypermedia database including selected flora and fauna in Kittitas County, Washington, was developed using HyperCard software for researching information about 400+ species. students had the opportunity to use the database in school libraries and in the author's fifth-grade classroom. The database cards accessed videodisc images where available. On the basis of limited study to date, it appears that when studying the environment, student learning was enhanced by the use of database material created by the author.


Grades - A Computer Based Score Management System For The Iigs, Charles Patrick Wahle Jan 1990

Grades - A Computer Based Score Management System For The Iigs, Charles Patrick Wahle

All Graduate Projects

This project involved the creation of an electronic gradebook designed to realistically meet the needs of classroom teachers. The computer source code was written in Pascal, a compiled language that allowed fast execution of any part of the program. The electronic gradebook called GradeS uses the graphic Desktop Intert'ace. It tracks up to 42 students per class, allowing up to 50 assignments per grading period. An unlimited number of classes can be stored on data disks. It produces four different types of whole class and individual student reports both on the screen and the printer. The project includes a report …


Developing An Optical Scanner Card For Computerized Football Scouting, Curtis Lowell Byrnes Jan 1970

Developing An Optical Scanner Card For Computerized Football Scouting, Curtis Lowell Byrnes

All Master's Theses

The purpose of this study is to develop an optical scanner scouting card to use in conjunction with the computer for football scouting.