Open Access. Powered by Scholars. Published by Universities.®

Operations Research, Systems Engineering and Industrial Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 2 of 2

Full-Text Articles in Operations Research, Systems Engineering and Industrial Engineering

Examination And Utilization Of Rare Features In Text Classification Of Injury Narratives, Hsin-Ying Huang Dec 2016

Examination And Utilization Of Rare Features In Text Classification Of Injury Narratives, Hsin-Ying Huang

Open Access Dissertations

Thanks to the advances in computing and information technology, analyzing injury surveillance data with statistical machine learning methods has grown in popularity, complexity, and quality over recent years. During that same time, researchers have recognized the limitations of statistical text analysis with limited training data. In response to the two primary challenges for statistical text analysis, dimensionality reduction and sparse data, many studies have focused on improving machine learning algorithms. Less research has been done, though, to examine and improve statistical machine learning methods in text classification from a linguistic perspective.

This study addresses this research gap by examining the …


Methods To Address Extreme Class Imbalance In Machine Learning Based Network Intrusion Detection Systems, Russell W. Walter Mar 2016

Methods To Address Extreme Class Imbalance In Machine Learning Based Network Intrusion Detection Systems, Russell W. Walter

Theses and Dissertations

Despite the considerable academic interest in using machine learning methods to detect cyber attacks and malicious network traffic, there is little evidence that modern organizations employ such systems. Due to the targeted nature of attacks and cybercriminals’ constantly changing behavior, valid observations of attack traffic suitable for training a classifier are extremely rare. Rare positive cases combined with the fact that the overwhelming majority of network traffic is benign create an extreme class imbalance problem. Using publically available datasets, this research examines the class imbalance problem by using small samples of the attack observations to create multiple training sets that …