Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Air Force Institute of Technology

Theses/Dissertations

Data mining

Articles 1 - 8 of 8

Full-Text Articles in Physical Sciences and Mathematics

Generative Methods, Meta-Learning, And Meta-Heuristics For Robust Cyber Defense, Marc W. Chale Sep 2022

Generative Methods, Meta-Learning, And Meta-Heuristics For Robust Cyber Defense, Marc W. Chale

Theses and Dissertations

Cyberspace is the digital communications network that supports the internet of battlefield things (IoBT), the model by which defense-centric sensors, computers, actuators and humans are digitally connected. A secure IoBT infrastructure facilitates real time implementation of the observe, orient, decide, act (OODA) loop across distributed subsystems. Successful hacking efforts by cyber criminals and strategic adversaries suggest that cyber systems such as the IoBT are not secure. Three lines of effort demonstrate a path towards a more robust IoBT. First, a baseline data set of enterprise cyber network traffic was collected and modelled with generative methods allowing the generation of realistic, …


Innovative Heuristics To Improve The Latent Dirichlet Allocation Methodology For Textual Analysis And A New Modernized Topic Modeling Approach, Jamie T. Zimmerman Jun 2022

Innovative Heuristics To Improve The Latent Dirichlet Allocation Methodology For Textual Analysis And A New Modernized Topic Modeling Approach, Jamie T. Zimmerman

Theses and Dissertations

Natural Language Processing is a complex method of data mining the vast trove of documents created and made available every day. Topic modeling seeks to identify the topics within textual corpora with limited human input into the process to speed analysis. Current topic modeling techniques used in Natural Language Processing have limitations in the pre-processing steps. This dissertation studies topic modeling techniques, those limitations in the pre-processing, and introduces new algorithms to gain improvements from existing topic modeling techniques while being competitive with computational complexity. This research introduces four contributions to the field of Natural Language Processing and topic modeling. …


Constructing Prediction Intervals With Neural Networks: An Empirical Evaluation Of Bootstrapping And Conformal Inference Methods, Alexander N. Contarino Mar 2022

Constructing Prediction Intervals With Neural Networks: An Empirical Evaluation Of Bootstrapping And Conformal Inference Methods, Alexander N. Contarino

Theses and Dissertations

Artificial neural networks (ANNs) are popular tools for accomplishing many machine learning tasks, including predicting continuous outcomes. However, the general lack of confidence measures provided with ANN predictions limit their applicability, especially in military settings where accuracy is paramount. Supplementing point predictions with prediction intervals (PIs) is common for other learning algorithms, but the complex structure and training of ANNs renders constructing PIs difficult. This work provides the network design choices and inferential methods for creating better performing PIs with ANNs to enable their adaptation for military use. A two-step experiment is executed across 11 datasets, including an imaged-based dataset. …


Telemetry Data Mining For Unmanned Aircraft Systems, Li Yu Mar 2022

Telemetry Data Mining For Unmanned Aircraft Systems, Li Yu

Theses and Dissertations

With ever more data becoming available to the US Air Force, it is vital to develop effective methods to leverage this strategic asset. Machine learning (ML) techniques present a means of meeting this challenge, as these tools have demonstrated successful use in commercial applications. For this research, three ML methods were applied to a unmanned aircraft system (UAS) telemetry dataset with the aim of extracting useful insight related to phases of flight. It was shown that ML provides an advantage in exploratory data analysis and as well as classification of phases. Neural network models demonstrated the best performance with over …


Detecting Potential Insider Threats Through Email Datamining, James S. Okolica Mar 2006

Detecting Potential Insider Threats Through Email Datamining, James S. Okolica

Theses and Dissertations

No abstract provided.


Efficient Generation Of Social Network Data From Computer-Mediated Communication Logs, Jason Wei Sung Yee Mar 2005

Efficient Generation Of Social Network Data From Computer-Mediated Communication Logs, Jason Wei Sung Yee

Theses and Dissertations

The insider threat poses a significant risk to any network or information system. A general definition of the insider threat is an authorized user performing unauthorized actions, a broad definition with no specifications on severity or action. While limited research has been able to classify and detect insider threats, it is generally understood that insider attacks are planned, and that there is a time period in which the organization's leadership can intervene and prevent the attack. Previous studies have shown that the person's behavior will generally change, and it is possible that social network analysis could be used to observe …


Using Sequence Analysis To Perform Application-Based Anomaly Detection Within An Artificial Immune System Framework, Larissa A. O'Brien Mar 2003

Using Sequence Analysis To Perform Application-Based Anomaly Detection Within An Artificial Immune System Framework, Larissa A. O'Brien

Theses and Dissertations

The Air Force and other Department of Defense (DoD) computer systems typically rely on traditional signature-based network IDSs to detect various types of attempted or successful attacks. Signature-based methods are limited to detecting known attacks or similar variants; anomaly-based systems, by contrast, alert on behaviors previously unseen. The development of an effective anomaly-detecting, application based IDS would increase the Air Force's ability to ward off attacks that are not detected by signature-based network IDSs, thus strengthening the layered defenses necessary to acquire and maintain safe, secure communication capability. This system follows the Artificial Immune System (AIS) framework, which relies on …


Data Mining Feature Subset Weighting And Selection Using Genetic Algorithms, Okan Yilmaz Mar 2002

Data Mining Feature Subset Weighting And Selection Using Genetic Algorithms, Okan Yilmaz

Theses and Dissertations

We present a simple genetic algorithm (sGA), which is developed under Genetic Rule and Classifier Construction Environment (GRaCCE) to solve feature subset selection and weighting problem to have better classification accuracy on k-nearest neighborhood (KNN) algorithm. Our hypotheses are that weighting the features will affect the performance of the KNN algorithm and will cause better classification accuracy rate than that of binary classification. The weighted-sGA algorithm uses real-value chromosomes to find the weights for features and binary-sGA uses integer-value chromosomes to select the subset of features from original feature set. A Repair algorithm is developed for weighted-sGA algorithm to guarantee …