Data Science | Open Access Articles | Digital Commons Network™

Adaptive Multi-Label Classification On Drifting Data Streams, Martha Roseberry

Theses and Dissertations

Drifting data streams and multi-label data are both challenging problems. When multi-label data arrives as a stream, the challenges of both problems must be addressed along with additional challenges unique to the combined problem. Algorithms must be fast and flexible, able to match both the speed and evolving nature of the stream. We propose four methods for learning from multi-label drifting data streams. First, a multi-label k Nearest Neighbors with Self Adjusting Memory (ML-SAM-kNN) exploits short- and long-term memories to predict the current and evolving states of the data stream. Second, a punitive k nearest neighbors algorithm with a self-adjusting …

Go to article

Learning From Multi-Class Imbalanced Big Data With Apache Spark, William C. Sleeman Iv

Theses and Dissertations

With data becoming a new form of currency, its analysis has become a top priority in both academia and industry, furthering advancements in high-performance computing and machine learning. However, these large, real-world datasets come with additional complications such as noise and class overlap. Problems are magnified when with multi-class data is presented, especially since many of the popular algorithms were originally designed for binary data. Another challenge arises when the number of examples are not evenly distributed across all classes in a dataset. This often causes classifiers to favor the majority class over the minority classes, leading to undesirable results …

Go to article

Reliable And Interpretable Machine Learning For Modeling Physical And Cyber Systems, Daniel L. Marino Lizarazo

Theses and Dissertations

Over the past decade, Machine Learning (ML) research has predominantly focused on building extremely complex models in order to improve predictive performance. The idea was that performance can be improved by adding complexity to the models. This approach proved to be successful in creating models that can approximate highly complex relationships while taking advantage of large datasets. However, this approach led to extremely complex black-box models that lack reliability and are difficult to interpret. By lack of reliability, we specifically refer to the lack of consistent (unpredictable) behavior in situations outside the training data. Lack of interpretability refers to the …

Go to article

Data Science Commons^™

Full-Text Articles in Data Science

Adaptive Multi-Label Classification On Drifting Data Streams, Martha Roseberry

Theses and Dissertations

Learning From Multi-Class Imbalanced Big Data With Apache Spark, William C. Sleeman Iv

Theses and Dissertations

Reliable And Interpretable Machine Learning For Modeling Physical And Cyber Systems, Daniel L. Marino Lizarazo

Theses and Dissertations