Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

Virginia Commonwealth University

Theses and Dissertations

2021

Imbalanced data

Articles 1 - 1 of 1

Full-Text Articles in Physical Sciences and Mathematics

Learning From Multi-Class Imbalanced Big Data With Apache Spark, William C. Sleeman Iv Jan 2021

Learning From Multi-Class Imbalanced Big Data With Apache Spark, William C. Sleeman Iv

Theses and Dissertations

With data becoming a new form of currency, its analysis has become a top priority in both academia and industry, furthering advancements in high-performance computing and machine learning. However, these large, real-world datasets come with additional complications such as noise and class overlap. Problems are magnified when with multi-class data is presented, especially since many of the popular algorithms were originally designed for binary data. Another challenge arises when the number of examples are not evenly distributed across all classes in a dataset. This often causes classifiers to favor the majority class over the minority classes, leading to undesirable results …