Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 7 of 7

Full-Text Articles in Physical Sciences and Mathematics

Toward Efficient Automation Of Interpretable Machine Learning Boosting, Nathan Neuhaus Jan 2020

Toward Efficient Automation Of Interpretable Machine Learning Boosting, Nathan Neuhaus

All Master's Theses

Developing efficient automated methods for Interpretable Machine Learning (IML) is an important and long-term goal in the field of Artificial Intelligence. Currently the Machine Learning landscape is dominated by Neural Networks (NNs) and Support Vector Machines (SVMs), models which are often highly accurate. Despite high accuracy, such models are essentially “black boxes” and therefore are too risky for situations like healthcare where real lives are at stake. In such situations, so called “glass-box” models, such as Decision Trees (DTs), Bayesian Networks (BNs), and Logic Relational (LR) models are often preferred, however can succumb to accuracy limitations. Unfortunately, having to choose …


Deep Learning Of 2-D Images Representing N-D Data In General Line Coordinates, Dmytro Dovhalets, Boris Kovalerchuk, Szilárd Vajda, Răzvan Andonie Jan 2018

Deep Learning Of 2-D Images Representing N-D Data In General Line Coordinates, Dmytro Dovhalets, Boris Kovalerchuk, Szilárd Vajda, Răzvan Andonie

Computer Science Faculty Scholarship

While knowledge discovery and n-D data visualization procedures are often efficient, the loss of information, occlusion, and clutter continue to be a challenge. General Line Coordinates (GLC) is a rather new technique to deal with such artifacts. GLC-Linear, which is one of the methods in GLC, allows transforming n-D numerical data to their visual representation as polylines losslessly. The method proposed in this paper uses these 2-D visual representations as input to Convolutional Neural Network (CNN) classifiers. The obtained classification accuracies are close to the ones obtained by other machine learning algorithms. The main benefit of the method is the …


Data Visualization And Classification Of Artificially Created Images, Dmytro Dovhalets Jan 2018

Data Visualization And Classification Of Artificially Created Images, Dmytro Dovhalets

All Master's Theses

Visualization of multidimensional data is a long-standing challenge in machine learning and knowledge discovery. A problem arises as soon as 4-dimensions are introduced since we live in a 3-dimensional world. There are methods out there which can visualize multidimensional data, but loss of information and clutter are still a problem. General Line Coordinates (GLC) can losslessly project n-dimensional data in 2- dimensions. A new method is introduced based on GLC called GLC-L. This new method can do interactive visualization, dimension reduction, and supervised learning. One of the applications of GLC-L is transformation of vector data into image data. This novel …


Asymptotically Unbiased Estimation Of A Nonsymmetric Dependence Measure Applied To Sensor Data Analytics And Financial Time Series, Angel Caƫaron, Razvan Andonie, Yvonne Chueh Aug 2017

Asymptotically Unbiased Estimation Of A Nonsymmetric Dependence Measure Applied To Sensor Data Analytics And Financial Time Series, Angel Caƫaron, Razvan Andonie, Yvonne Chueh

All Faculty Scholarship for the College of the Sciences

A fundamental concept frequently applied to statistical machine learning is the detection of dependencies between unknown random variables found from data samples. In previous work, we have introduced a nonparametric unilateral dependence measure based on Onicescu’s information energy and a kNN method for estimating this measure from an available sample set of discrete or continuous variables. This paper provides the formal proofs which show that the estimator is asymptotically unbiased and has asymptotic zero variance when the sample size increases. It implies that the estimator has good statistical qualities. We investigate the performance of the estimator for data analysis applications …


Constructing Interactive Visual Classification, Clustering And Dimension Reduction Models For N-D Data, Boris Kovalerchuk, Dmytro Dovhalets Jul 2017

Constructing Interactive Visual Classification, Clustering And Dimension Reduction Models For N-D Data, Boris Kovalerchuk, Dmytro Dovhalets

Computer Science Faculty Scholarship

The exploration of multidimensional datasets of all possible sizes and dimensions is a long-standing challenge in knowledge discovery, machine learning, and visualization. While multiple efficient visualization methods for n-D data analysis exist, the loss of information, occlusion, and clutter continue to be a challenge. This paper proposes and explores a new interactive method for visual discovery of n-D relations for supervised learning. The method includes automatic, interactive, and combined algorithms for discovering linear relations, dimension reduction, and generalization for non-linear relations. This method is a special category of reversible General Line Coordinates (GLC). It produces graphs in 2-D that represent …


Visual Knowledge Discovery And Machine Learning For Investment Strategy, Antoni Wilinski, Boris Kovalerchuk May 2017

Visual Knowledge Discovery And Machine Learning For Investment Strategy, Antoni Wilinski, Boris Kovalerchuk

All Faculty Scholarship for the College of the Sciences

Knowledge discovery is an important aspect of human cognition. The advantage of the visual approach is in opportunity to substitute some complex cognitive tasks by easier perceptual tasks. However for cognitive tasks such as financial investment decision making this opportunity faces the challenge that financial data are abstract multidimensional and multivariate, i.e., outside of traditional visual perception in 2D or 3D world. This paper presents an approach to find an investment strategy based on pattern discovery in multidimensional space of specifically prepared time series. Visualization based on the lossless Collocated Paired Coordinates (CPC) plays an important role in this approach …


Asymptotically Unbiased Estimator Of The Informational Energy With Knn, Angel Caţaron, Răzvan Andonie, Chinmei Y. Chueh Oct 2013

Asymptotically Unbiased Estimator Of The Informational Energy With Knn, Angel Caţaron, Răzvan Andonie, Chinmei Y. Chueh

All Faculty Scholarship for the College of the Sciences

Motivated by machine learning applications (e.g., classification, function approximation, feature extraction), in previous work, we have introduced a non- parametric estimator of Onicescu’s informational energy. Our method was based on the k-th nearest neighbor distances between the n sample points, where k is a fixed positive integer. In the present contribution, we discuss mathematical properties of this estimator. We show that our estimator is asymptotically unbiased and consistent. We provide further experimental results which illustrate the convergence of the estimator for standard distributions.