Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Data Science

Series

Machine learning

Institution
Publication Year
Publication
File Type

Articles 1 - 29 of 29

Full-Text Articles in Physical Sciences and Mathematics

Dei: Exploring Academic Reflections Using Natural Language Processing To Create A Roadmap Of Student Success And Foster Inclusive Engineering Education, Rajvir H. Vyas, Nidhi Raviprasad Oct 2023

Dei: Exploring Academic Reflections Using Natural Language Processing To Create A Roadmap Of Student Success And Foster Inclusive Engineering Education, Rajvir H. Vyas, Nidhi Raviprasad

College of Engineering Summer Undergraduate Research Program

Every year, the College of Engineering (CENG) students and faculty reach out to admitted students through “Text-a-Thon” programs to answer their questions about being a student at Cal Poly. In order to improve CENG outreach efforts, we analyzed these text conversations to predict the likelihood of an admitted student accepting an offer of admission from Cal Poly. Through our research, we discovered key factors that play a role in a student committing to Cal Poly through data-based insights. Additionally, we successfully used a human-on-the-loop system to help create Machine Learning (ML) models that predict satisfaction of response by way of …


Verifying Empirical Predictive Modeling Of Societal Vulnerability To Hazardous Events: A Monte Carlo Experimental Approach, Yi Victor Wang, Seung Hee Kim, Menas C. Kafatos Aug 2023

Verifying Empirical Predictive Modeling Of Societal Vulnerability To Hazardous Events: A Monte Carlo Experimental Approach, Yi Victor Wang, Seung Hee Kim, Menas C. Kafatos

Institute for ECHO Articles and Research

With the emergence of large amounts of historical records on adverse impacts of hazardous events, empirical predictive modeling has been revived as a foundational paradigm for quantifying disaster vulnerability of societal systems. This paradigm models societal vulnerability to hazardous events as a vulnerability curve indicating an expected loss rate of a societal system with respect to a possible spectrum of intensity measure (IM) of an event. Although the empirical predictive models (EPMs) of societal vulnerability are calibrated on historical data, they should not be experimentally tested with data derived from field experiments on any societal system. Alternatively, in this paper, …


Dense & Attention Convolutional Neural Networks For Toe Walking Recognition, Junde Chen, Rahul Soangra, Marybeth Grant-Beuttler, Y. A. Nanehkaran, Yuxin Wen May 2023

Dense & Attention Convolutional Neural Networks For Toe Walking Recognition, Junde Chen, Rahul Soangra, Marybeth Grant-Beuttler, Y. A. Nanehkaran, Yuxin Wen

Physical Therapy Faculty Articles and Research

Idiopathic toe walking (ITW) is a gait disorder where children’s initial contacts show limited or no heel touch during the gait cycle. Toe walking can lead to poor balance, increased risk of falling or tripping, leg pain, and stunted growth in children. Early detection and identification can facilitate targeted interventions for children diagnosed with ITW. This study proposes a new one-dimensional (1D) Dense & Attention convolutional network architecture, which is termed as the DANet, to detect idiopathic toe walking. The dense block is integrated into the network to maximize information transfer and avoid missed features. Further, the attention modules are …


Comparing Igneous Geochemical Data From Hawaii And Southern California Via Machine Learning, Miro Manestar Apr 2023

Comparing Igneous Geochemical Data From Hawaii And Southern California Via Machine Learning, Miro Manestar

MS in Computer Science Project Reports

Bi-plots are commonly used in geochemical analyses. However, their use can become cumbersome in the case of multi-variate analyses. Therefore, this thesis explores the application of unsupervised machine learning techniques, specifically PCA and K-Means, to analyze large geochemical data sets from two distinct regions, Hawaii and the \acrfull{prb} in Southern California. The IBM Foundational Methodology for Data Science was utilized to ensure proper data preparation and analysis. PCA provided dimensionality reduction, revealing which features correlated most strongly with variances within the data. K-Means clustering allowed for deeper interpretation of the data. The analysis yielded valuable insights into the composition and …


Nviz: Unraveling Neural Networks Through Visualization, Kevin Hoffman Apr 2023

Nviz: Unraveling Neural Networks Through Visualization, Kevin Hoffman

Mathematics and Computer Science Presentations

The growing utility of artificial intelligence (AI) is attributed to the development of neural networks. These networks are a class of models that make predictions based on previously observed data. While the inferential power of neural networks is great, the ability to explain their results is difficult because the underlying model is automatically generated. The AI community commonly refers to neural networks as black boxes because the patterns they learn from the data are not easily understood. This project aims to improve the visibility of patterns that neural networks identify in data. Through an interactive web application, NVIZ affords the …


Emotion Classification Of Indonesian Tweets Using Bidirectional Lstm, Aaron K. Glenn, Phillip M. Lacasse, Bruce A. Cox Feb 2023

Emotion Classification Of Indonesian Tweets Using Bidirectional Lstm, Aaron K. Glenn, Phillip M. Lacasse, Bruce A. Cox

Faculty Publications

Emotion classification can be a powerful tool to derive narratives from social media data. Traditional machine learning models that perform emotion classification on Indonesian Twitter data exist but rely on closed-source features. Recurrent neural networks can meet or exceed the performance of state-of-the-art traditional machine learning techniques using exclusively open-source data and models. Specifically, these results show that recurrent neural network variants can produce more than an 8% gain in accuracy in comparison with logistic regression and SVM techniques and a 15% gain over random forest when using FastText embeddings. This research found a statistical significance in the performance of …


Transfer Learning Using Infrared And Optical Full Motion Video Data For Gender Classification, Alexander M. Glandon, Joe Zalameda, Khan M. Iftekharuddin, Gabor F. Fulop (Ed.), David Z. Ting (Ed.), Lucy L. Zheng (Ed.) Jan 2023

Transfer Learning Using Infrared And Optical Full Motion Video Data For Gender Classification, Alexander M. Glandon, Joe Zalameda, Khan M. Iftekharuddin, Gabor F. Fulop (Ed.), David Z. Ting (Ed.), Lucy L. Zheng (Ed.)

Electrical & Computer Engineering Faculty Publications

This work is a review and extension of our ongoing research in human recognition analysis using multimodality motion sensor data. We review our work on hand crafted feature engineering for motion capture skeleton (MoCap) data, from the Air Force Research Lab for human gender followed by depth scan based skeleton extraction using LIDAR data from the Army Night Vision Lab for person identification. We then build on these works to demonstrate a transfer learning sensor fusion approach for using the larger MoCap and smaller LIDAR data for gender classification.


Health Care Equity Through Intelligent Edge Computing And Augmented Reality/Virtual Reality: A Systematic Review, Vishal Lakshminarayanan, Aswathy Ravikumar, Harini Sriraman, Sujatha Alla, Vijay Kumar Chattu Jan 2023

Health Care Equity Through Intelligent Edge Computing And Augmented Reality/Virtual Reality: A Systematic Review, Vishal Lakshminarayanan, Aswathy Ravikumar, Harini Sriraman, Sujatha Alla, Vijay Kumar Chattu

Engineering Management & Systems Engineering Faculty Publications

Intellectual capital is a scarce resource in the healthcare industry. Making the most of this resource is the first step toward achieving a completely intelligent healthcare system. However, most existing centralized and deep learning-based systems are unable to adapt to the growing volume of global health records and face application issues. To balance the scarcity of healthcare resources, the emerging trend of IoMT (Internet of Medical Things) and edge computing will be very practical and cost-effective. A full examination of the transformational role of intelligent edge computing in the IoMT era to attain health care equity is offered in this …


Machine Learning Prediction Of Dod Personal Property Shipment Costs, Tiffany Tucker [*], Torrey J. Wagner, Paul Auclair, Brent T. Langhals Jan 2023

Machine Learning Prediction Of Dod Personal Property Shipment Costs, Tiffany Tucker [*], Torrey J. Wagner, Paul Auclair, Brent T. Langhals

Faculty Publications

U.S. Department of Defense (DoD) personal property moves account for 15% of all domestic and international moves - accurate prediction of their cost could draw attention to outlier shipments and improve budget planning. In this work 136,140 shipments between 13 personal property shipment hubs from April 2022 through March 2023 with a total cost of $1.6B were analyzed. Shipment cost was predicted using recursive feature elimination on linear regression and XGBoost algorithms, as well as through neural network hyperparameter sweeps. Modeling was repeated after removing 28 features related to shipment hub location and branch of service to examine their influence …


Pyseg: A Python Package For 2d Material Flake Localization, Segmentation, And Thickness Prediction, Diana B. Horangic Dec 2022

Pyseg: A Python Package For 2d Material Flake Localization, Segmentation, And Thickness Prediction, Diana B. Horangic

Student Research Projects

Thin materials are of interest for their extraordinary physical, mechanical, thermal, electrical, and optical properties. Monolayers and bilayers of 2D materials can be manufactured through a variety of exfoliation methods. To determine layer thickness, Raman spectroscopy or other methods like Rayleigh scattering are used. These methods are, however, slow, and they require equipment beyond an optical microscope. A Python package that automates flake identification processes was built, with access solely to RGB data from an optical microscope assumed. My package, pyseg, localizes flakes on a substrate and then makes a rough estimate of their thickness from first principles. It can …


The Interaction Of Normalisation And Clustering In Sub-Domain Definition For Multi-Source Transfer Learning Based Time Series Anomaly Detection, Matthew Nicholson, Rahul Agrahari, Clare Conran, Haythem Assem, John D. Kelleher Dec 2022

The Interaction Of Normalisation And Clustering In Sub-Domain Definition For Multi-Source Transfer Learning Based Time Series Anomaly Detection, Matthew Nicholson, Rahul Agrahari, Clare Conran, Haythem Assem, John D. Kelleher

Articles

This paper examines how data normalisation and clustering interact in the definition of sub-domains within multi-source transfer learning systems for time series anomaly detection. The paper introduces a distinction between (i) clustering as a primary/direct method for anomaly detection, and (ii) clustering as a method for identifying sub-domains within the source or target datasets. Reporting the results of three sets of experiments, we find that normalisation after feature extraction and before clustering results in the best performance for anomaly detection. Interestingly, we find that in the multi-source transfer learning scenario clustering on the target dataset and identifying subdomains in the …


Design Of Secure Communication Schemes To Provide Authentication And Integrity Among The Iot Devices, Vidya Rao Dr. Nov 2022

Design Of Secure Communication Schemes To Provide Authentication And Integrity Among The Iot Devices, Vidya Rao Dr.

Technical Collection

The fast growth in Internet-of-Things (IoT) based applications, has increased the number of end-devices communicating over the Internet. The end devices are made with fewer resources and are low battery-powered. These resource-constrained devices are exposed to various security and privacy concerns over publicly available Internet communication. Thus, it becomes essential to provide lightweight security solutions to safeguard data and user privacy. Elliptic Curve Cryptography (ECC) can be used to generate the digital signature and also encrypt the data. The method can be evaluated on a real-time testbed deployed using Raspberry Pi3 devices and every message transmitted is subjected to ECC. …


Real Time Call-Flagging System To Respond To Suicidal Ideation In Call Centers, Vishnu Menon, Joseph Carrigan, Charles Floeder, Thomas Walton, Devin Mcguire May 2022

Real Time Call-Flagging System To Respond To Suicidal Ideation In Call Centers, Vishnu Menon, Joseph Carrigan, Charles Floeder, Thomas Walton, Devin Mcguire

Honors Theses

The 2021-2022 Signature Performance Design Studio team developed a live audio call-flagging system that enables faster responses and new response pathways to veteran crises by call service representatives and their management team. Using a custom made deep learning model, live audio streaming server, and Teams broadcasting add-on, the system empowers Signature Performance call service representatives to make quicker and more well informed decisions to provide veteran’s the best care possible.


Volitional Control Of Lower-Limb Prosthesis With Vision-Assisted Environmental Awareness, S M Shafiul Hasan Mar 2022

Volitional Control Of Lower-Limb Prosthesis With Vision-Assisted Environmental Awareness, S M Shafiul Hasan

FIU Electronic Theses and Dissertations

Early and reliable prediction of user’s intention to change locomotion mode or speed is critical for a smooth and natural lower limb prosthesis. Meanwhile, incorporation of explicit environmental feedback can facilitate context aware intelligent prosthesis which allows seamless operation in a variety of gait demands. This dissertation introduces environmental awareness through computer vision and enables early and accurate prediction of intention to start, stop or change speeds while walking. Electromyography (EMG), Electroencephalography (EEG), Inertial Measurement Unit (IMU), and Ground Reaction Force (GRF) sensors were used to predict intention to start, stop or increase walking speed. Furthermore, it was investigated whether …


Assessing Feature Representations For Instance-Based Cross-Domain Anomaly Detection In Cloud Services Univariate Time Series Data, Rahul Agrahari, Matthew Nicholson, Clare Conran, Haythem Assem, John D. Kelleher Jan 2022

Assessing Feature Representations For Instance-Based Cross-Domain Anomaly Detection In Cloud Services Univariate Time Series Data, Rahul Agrahari, Matthew Nicholson, Clare Conran, Haythem Assem, John D. Kelleher

Articles

In this paper, we compare and assess the efficacy of a number of time-series instance feature representations for anomaly detection. To assess whether there are statistically significant differences between different feature representations for anomaly detection in a time series, we calculate and compare confidence intervals on the average performance of different feature sets across a number of different model types and cross-domain time-series datasets. Our results indicate that the catch22 time-series feature set augmented with features based on rolling mean and variance performs best on average, and that the difference in performance between this feature set and the next best …


Development Of Advanced Machine Learning Models For Analysis Of Plutonium Surrogate Optical Emission Spectra, Ashwin P. Rao, Phillip R. Jenkins, John D. Auxier Ii, Michael B. Shattan, Anil Patnaik Jan 2022

Development Of Advanced Machine Learning Models For Analysis Of Plutonium Surrogate Optical Emission Spectra, Ashwin P. Rao, Phillip R. Jenkins, John D. Auxier Ii, Michael B. Shattan, Anil Patnaik

Faculty Publications

This work investigates and applies machine learning paradigms seldom seen in analytical spectroscopy for quantification of gallium in cerium matrices via processing of laser-plasma spectra. Ensemble regressions, support vector machine regressions, Gaussian kernel regressions, and artificial neural network techniques are trained and tested on cerium-gallium pellet spectra. A thorough hyperparameter optimization experiment is conducted initially to determine the best design features for each model. The optimized models are evaluated for sensitivity and precision using the limit of detection (LoD) and root mean-squared error of prediction (RMSEP) metrics, respectively. Gaussian kernel regression yields the superlative predictive model with an RMSEP of …


Facial Landmark Feature Fusion In Transfer Learning Of Child Facial Expressions, Megan A. Witherow, Manar D. Samad, Norou Diawara, Khan M. Iftekharuddin Jan 2022

Facial Landmark Feature Fusion In Transfer Learning Of Child Facial Expressions, Megan A. Witherow, Manar D. Samad, Norou Diawara, Khan M. Iftekharuddin

Electrical & Computer Engineering Faculty Publications

Automatic classification of child facial expressions is challenging due to the scarcity of image samples with annotations. Transfer learning of deep convolutional neural networks (CNNs), pretrained on adult facial expressions, can be effectively finetuned for child facial expression classification using limited facial images of children. Recent work inspired by facial age estimation and age-invariant face recognition proposes a fusion of facial landmark features with deep representation learning to augment facial expression classification performance. We hypothesize that deep transfer learning of child facial expressions may also benefit from fusing facial landmark features. Our proposed model architecture integrates two input branches: a …


Comparing Machine Learning Techniques With State-Of-The-Art Parametric Prediction Models For Predicting Soybean Traits, Susweta Ray Dec 2021

Comparing Machine Learning Techniques With State-Of-The-Art Parametric Prediction Models For Predicting Soybean Traits, Susweta Ray

Department of Statistics: Dissertations, Theses, and Student Work

Soybean is a significant source of protein and oil, and also widely used as animal feed. Thus, developing lines that are superior in terms of yield, protein and oil content is important to feed the ever-growing population. As opposed to the high-cost phenotyping, genotyping is both cost and time efficient for breeders while evaluating new lines in different environments (location-year combinations) can be costly. Several Genomic prediction (GP) methods have been developed to use the marker and environment data effectively to predict the yield or other relevant phenotypic traits of crops. Our study compares a conventional GP method (GBLUP), a …


Performance Of Openbci Eeg Binary Intent Classification With Laryngeal Imagery, Nathan George, Samuel Kuhn Jul 2021

Performance Of Openbci Eeg Binary Intent Classification With Laryngeal Imagery, Nathan George, Samuel Kuhn

Regis University Faculty Publications (comprehensive list)

One of the greatest goals of neuroscience in recent decades has been to rehabilitate individuals who no longer have a functional relationship between their mind and their body. Although neuroscience has produced technologies which allow the brains of paralyzed patients to accomplish tasks such as spell words or control a motorized wheelchair, these technologies utilize parts of the brain which may not be optimal for simultaneous use. For example, if you needed to look at flashing lights to spell words for communication, it would be difficult to simultaneously look at where you are moving. To improve upon this issue, this …


Per-Pixel Cloud Cover Classification Of Multispectral Landsat-8 Data, Salome E. Carrasco [*], Torrey J. Wagner, Brent T. Langhals Jun 2021

Per-Pixel Cloud Cover Classification Of Multispectral Landsat-8 Data, Salome E. Carrasco [*], Torrey J. Wagner, Brent T. Langhals

Faculty Publications

Random forest and neural network algorithms are applied to identify cloud cover using 10 of the wavelength bands available in Landsat 8 imagery. The methods classify each pixel into 4 different classes: clear, cloud shadow, light cloud, or cloud. The first method is based on a fully connected neural network with ten input neurons, two hidden layers of 8 and 10 neurons respectively, and a single-neuron output for each class. This type of model is considered with and without L2 regularization applied to the kernel weighting. The final model type is a random forest classifier created from an ensemble of …


Detecting Hacker Threats: Performance Of Word And Sentence Embedding Models In Identifying Hacker Communications, Susan Mckeever, Brian Keegan, Andrei Quieroz Dec 2020

Detecting Hacker Threats: Performance Of Word And Sentence Embedding Models In Identifying Hacker Communications, Susan Mckeever, Brian Keegan, Andrei Quieroz

Conference papers

Abstract—Cyber security is striving to find new forms of protection against hacker attacks. An emerging approach nowadays is the investigation of security-related messages exchanged on deep/dark web and even surface web channels. This approach can be supported by the use of supervised machine learning models and text mining techniques. In our work, we compare a variety of machine learning algorithms, text representations and dimension reduction approaches for the detection accuracies of software-vulnerability-related communications. Given the imbalanced nature of the three public datasets used, we investigate appropriate sampling approaches to boost detection accuracies of our models. In addition, we examine how …


Using Data Analytics To Predict Students Score, Nang Laik Ma, Gim Hong Chua Nov 2020

Using Data Analytics To Predict Students Score, Nang Laik Ma, Gim Hong Chua

Research Collection School Of Computing and Information Systems

Education is very important to Singapore, and the government has continued to invest heavily in our education system to become one of the world-class systems today. A strong foundation of Science, Technology, Engineering, and Mathematics (STEM) was what underpinned Singapore's development over the past 50 years. PISA is a triennial international survey that evaluates education systems worldwide by testing the skills and knowledge of 15-year-old students who are nearing the end of compulsory education. In this paper, the authors used the PISA data from 2012 and 2015 and developed machine learning techniques to predictive the students' scores and understand the …


A New Efficient Method To Detect Genetic Interactions For Lung Cancer Gwas, Jennifer Luyapan, Xuemei Ji, Siting Li, Xiangjun Xiao, Dakai Zhu, Eric J. Duell, David C. Christiani, Matthew B. Schabath, Susanne M. Arnold, Shanbeh Zienolddiny, Hans Brunnström, Olle Melander, Mark D. Thornquist, Todd A. Mackenzie, Christopher I. Amos, Jiang Gui Oct 2020

A New Efficient Method To Detect Genetic Interactions For Lung Cancer Gwas, Jennifer Luyapan, Xuemei Ji, Siting Li, Xiangjun Xiao, Dakai Zhu, Eric J. Duell, David C. Christiani, Matthew B. Schabath, Susanne M. Arnold, Shanbeh Zienolddiny, Hans Brunnström, Olle Melander, Mark D. Thornquist, Todd A. Mackenzie, Christopher I. Amos, Jiang Gui

Markey Cancer Center Faculty Publications

BACKGROUND: Genome-wide association studies (GWAS) have proven successful in predicting genetic risk of disease using single-locus models; however, identifying single nucleotide polymorphism (SNP) interactions at the genome-wide scale is limited due to computational and statistical challenges. We addressed the computational burden encountered when detecting SNP interactions for survival analysis, such as age of disease-onset. To confront this problem, we developed a novel algorithm, called the Efficient Survival Multifactor Dimensionality Reduction (ES-MDR) method, which used Martingale Residuals as the outcome parameter to estimate survival outcomes, and implemented the Quantitative Multifactor Dimensionality Reduction method to identify significant interactions associated with age of …


European Floating Strike Lookback Options: Alpha Prediction And Generation Using Unsupervised Learning, Tristan Lim, Aldy Gunawan, Chin Sin Ong Oct 2020

European Floating Strike Lookback Options: Alpha Prediction And Generation Using Unsupervised Learning, Tristan Lim, Aldy Gunawan, Chin Sin Ong

Research Collection School Of Computing and Information Systems

This research utilized the intrinsic quality of European floating strike lookback call options, alongside selected return and volatility parameters, in a K-means clustering environment, to recommend an alpha generative trading strategy. The result is an elegant easy-to-use alpha strategy based on the option mechanisms which identifies investment assets with high degree of significance. In an upward trending market, the research had identified European floating strike lookback call option as an evaluative criterion and investable asset, which would both allow investors to predict and profit from alpha opportunities. The findings will be useful for (i) buy-side investors seeking alpha generation and/or …


Cooperative Co-Evolution For Feature Selection In Big Data With Random Feature Grouping, A.N.M. Bazlur Rashid, Mohiuddin Ahmed, Leslie F. Sikos, Paul Haskell-Dowland Jan 2020

Cooperative Co-Evolution For Feature Selection In Big Data With Random Feature Grouping, A.N.M. Bazlur Rashid, Mohiuddin Ahmed, Leslie F. Sikos, Paul Haskell-Dowland

Research outputs 2014 to 2021

© 2020, The Author(s). A massive amount of data is generated with the evolution of modern technologies. This high-throughput data generation results in Big Data, which consist of many features (attributes). However, irrelevant features may degrade the classification performance of machine learning (ML) algorithms. Feature selection (FS) is a technique used to select a subset of relevant features that represent the dataset. Evolutionary algorithms (EAs) are widely used search strategies in this domain. A variant of EAs, called cooperative co-evolution (CC), which uses a divide-and-conquer approach, is a good choice for optimization problems. The existing solutions have poor performance because …


Modelling Interleaved Activities Using Language Models, Eoin Rogers, Robert J. Ross, John D. Kelleher Jan 2020

Modelling Interleaved Activities Using Language Models, Eoin Rogers, Robert J. Ross, John D. Kelleher

Conference papers

We propose a new approach to activity discovery, based on the neural language modelling of streaming sensor events. Our approach proceeds in multiple stages: we build binary links between activities using probability distributions generated by a neural language model trained on the dataset, and combine the binary links to produce complex activities. We then use the activities as sensor events, allowing us to build complex hierarchies of activities. We put an emphasis on dealing with interleaving, which represents a major challenge for many existing activity discovery systems. The system is tested on a realistic dataset, demonstrating it as a promising …


Disaster Damage Categorization Applying Satellite Images And Machine Learning Algorithm, Farinaz Sabz Ali Pour, Adrian Gheorghe Jan 2020

Disaster Damage Categorization Applying Satellite Images And Machine Learning Algorithm, Farinaz Sabz Ali Pour, Adrian Gheorghe

Engineering Management & Systems Engineering Faculty Publications

Special information has a significant role in disaster management. Land cover mapping can detect short- and long-term changes and monitor the vulnerable habitats. It is an effective evaluation to be included in the disaster management system to protect the conservation areas. The critical visual and statistical information presented to the decision-makers can help in mitigation or adaption before crossing a threshold. This paper aims to contribute in the academic and the practice aspects by offering a potential solution to enhance the disaster data source effectiveness. The key research question that the authors try to answer in this paper is how …


Watersheds For Semi-Supervised Classification, Aditya Challa, Sravan Danda, B. S.Daya Sagar, Laurent Najman May 2019

Watersheds For Semi-Supervised Classification, Aditya Challa, Sravan Danda, B. S.Daya Sagar, Laurent Najman

Journal Articles

Watershed technique from mathematical morphology (MM) is one of the most widely used operators for image segmentation. Recently watersheds are adapted to edge weighted graphs, allowing for wider applicability. However, a few questions remain to be answered - How do the boundaries of the watershed operator behave? Which loss function does the watershed operator optimize? How does watershed operator relate with existing ideas from machine learning. In this letter, a framework is developed, which allows one to answer these questions. This is achieved by generalizing the maximum margin principle to maximum margin partition and proposing a generic solution, morphMedian, resulting …


Constructing Interactive Visual Classification, Clustering And Dimension Reduction Models For N-D Data, Boris Kovalerchuk, Dmytro Dovhalets Jul 2017

Constructing Interactive Visual Classification, Clustering And Dimension Reduction Models For N-D Data, Boris Kovalerchuk, Dmytro Dovhalets

Computer Science Faculty Scholarship

The exploration of multidimensional datasets of all possible sizes and dimensions is a long-standing challenge in knowledge discovery, machine learning, and visualization. While multiple efficient visualization methods for n-D data analysis exist, the loss of information, occlusion, and clutter continue to be a challenge. This paper proposes and explores a new interactive method for visual discovery of n-D relations for supervised learning. The method includes automatic, interactive, and combined algorithms for discovering linear relations, dimension reduction, and generalization for non-linear relations. This method is a special category of reversible General Line Coordinates (GLC). It produces graphs in 2-D that represent …