Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Keyword
-
- Data mining (3)
- Machine learning (2)
- Anomaly (1)
- Bootsrapping (1)
- Conformal interference (1)
-
- Cumulative Distribution Function (1)
- Datasets (1)
- Deep learning (1)
- Explainable artificial intelligence (1)
- Feature selection (1)
- GNSS (1)
- GPS (1)
- Gaussian mixture (1)
- Gradient boosting (1)
- Information science (1)
- Interference (1)
- Komogorov Smirnov Test (1)
- LIBS (1)
- Latent Dirichlet Allocation (1)
- Machine learning interpretability (1)
- Multivariate data (1)
- Natural language processing (1)
- Neural networks (1)
- PI (Prediction interval) (1)
- Phase of flight (1)
- Pu surrogate (1)
- Textual analysis (1)
- Unmanned aircraft systems (1)
- Word clouds (1)
- Publication
- Publication Type
Articles 1 - 8 of 8
Full-Text Articles in Physical Sciences and Mathematics
Modern Approaches And Theoretical Extensions To The Multivariate Kolmogorov Smirnov Test, Gonzalo Hernando
Modern Approaches And Theoretical Extensions To The Multivariate Kolmogorov Smirnov Test, Gonzalo Hernando
Theses and Dissertations
Most statistical tests are fully developed for univariate data, but when inference is required for multivariate data, univariate tests risk information loss and interpretability. This research 1) derives and extends the multivariate Komolgorov Smirnov test for 2 and into m-dimensions, 2) derives small sample critical values for the KS test that are not reliant on sample size simulations or correlation between variables, 3) extends large sample estimations and current KS implementations, and 4) provides sample size and power calculations in order to enable experimental design with respect to testing for differences in distributions. Through extensive simulation, we demonstrate that our …
Improving Country Conflict And Peace Modeling: Datasets, Imputations, And Hierarchical Clustering, Benjamin D. Leiby
Improving Country Conflict And Peace Modeling: Datasets, Imputations, And Hierarchical Clustering, Benjamin D. Leiby
Theses and Dissertations
Many disparate datasets exist that provide country attributes covering political, economic, and social aspects. Unfortunately, this data often does not include all countries nor is the data complete for those countries included, as measured by the dataset’s missingness. This research addresses these dataset shortfalls in predicting country instability by considering country attributes in all aspects as well as in greater thresholds of missingness. First, a structured summary of past research is presented framed by a developed casual taxonomy and functional ontology. Additionally, a novel imputation technique for very large datasets is presented to account for moderate missingness in the expanded …
Innovative Heuristics To Improve The Latent Dirichlet Allocation Methodology For Textual Analysis And A New Modernized Topic Modeling Approach, Jamie T. Zimmerman
Innovative Heuristics To Improve The Latent Dirichlet Allocation Methodology For Textual Analysis And A New Modernized Topic Modeling Approach, Jamie T. Zimmerman
Theses and Dissertations
Natural Language Processing is a complex method of data mining the vast trove of documents created and made available every day. Topic modeling seeks to identify the topics within textual corpora with limited human input into the process to speed analysis. Current topic modeling techniques used in Natural Language Processing have limitations in the pre-processing steps. This dissertation studies topic modeling techniques, those limitations in the pre-processing, and introduces new algorithms to gain improvements from existing topic modeling techniques while being competitive with computational complexity. This research introduces four contributions to the field of Natural Language Processing and topic modeling. …
Leveraging Machine Learning For Large Scale Analysis Of Publicly-Available Data For Gnss Interference Events, David K. Stamper
Leveraging Machine Learning For Large Scale Analysis Of Publicly-Available Data For Gnss Interference Events, David K. Stamper
Theses and Dissertations
This research documents architecture and implementation of an enhanced interference detection and classification analysis system, using both a database and storage solution utilizing machine learning algorithms to detect changes in Carrier-to-Noise strength over multiple GNSS sites. The system uses publicly-available government supported receivers to detect interference, and built using FOSS packaged as a programming library through Python. Two algorithms are discussed in terms of enhancing interference detection using both non-machine learning and machine learning approaches. Two algorithms are also discussed which are used for classification of events. In addition, an approach to Large Scale data analytics is demonstrated via a …
Generalized Robust Feature Selection, Bradford L. Lott
Generalized Robust Feature Selection, Bradford L. Lott
Theses and Dissertations
Feature selection may be summarized as identifying salient features to a given response. Understanding which features affect the response enables, in the future, only collecting consequential data; hence, the feature selection algorithm may lead to saving effort spent collecting data, storage resources, as well as computational resources for making predictions. We propose a generalized approach to select the salient features of data sets. Our approach may also be applied to unsupervised datasets to understand which data streams provide unique information. We contend our approach identifies salient features robust to the sub-sequent predictive model applied. The proposed algorithm considers all provided …
Telemetry Data Mining For Unmanned Aircraft Systems, Li Yu
Telemetry Data Mining For Unmanned Aircraft Systems, Li Yu
Theses and Dissertations
With ever more data becoming available to the US Air Force, it is vital to develop effective methods to leverage this strategic asset. Machine learning (ML) techniques present a means of meeting this challenge, as these tools have demonstrated successful use in commercial applications. For this research, three ML methods were applied to a unmanned aircraft system (UAS) telemetry dataset with the aim of extracting useful insight related to phases of flight. It was shown that ML provides an advantage in exploratory data analysis and as well as classification of phases. Neural network models demonstrated the best performance with over …
Constructing Prediction Intervals With Neural Networks: An Empirical Evaluation Of Bootstrapping And Conformal Inference Methods, Alexander N. Contarino
Constructing Prediction Intervals With Neural Networks: An Empirical Evaluation Of Bootstrapping And Conformal Inference Methods, Alexander N. Contarino
Theses and Dissertations
Artificial neural networks (ANNs) are popular tools for accomplishing many machine learning tasks, including predicting continuous outcomes. However, the general lack of confidence measures provided with ANN predictions limit their applicability, especially in military settings where accuracy is paramount. Supplementing point predictions with prediction intervals (PIs) is common for other learning algorithms, but the complex structure and training of ANNs renders constructing PIs difficult. This work provides the network design choices and inferential methods for creating better performing PIs with ANNs to enable their adaptation for military use. A two-step experiment is executed across 11 datasets, including an imaged-based dataset. …
Development Of Advanced Machine Learning Models For Analysis Of Plutonium Surrogate Optical Emission Spectra, Ashwin P. Rao, Phillip R. Jenkins, John D. Auxier Ii, Michael B. Shattan, Anil Patnaik
Development Of Advanced Machine Learning Models For Analysis Of Plutonium Surrogate Optical Emission Spectra, Ashwin P. Rao, Phillip R. Jenkins, John D. Auxier Ii, Michael B. Shattan, Anil Patnaik
Faculty Publications
This work investigates and applies machine learning paradigms seldom seen in analytical spectroscopy for quantification of gallium in cerium matrices via processing of laser-plasma spectra. Ensemble regressions, support vector machine regressions, Gaussian kernel regressions, and artificial neural network techniques are trained and tested on cerium-gallium pellet spectra. A thorough hyperparameter optimization experiment is conducted initially to determine the best design features for each model. The optimized models are evaluated for sensitivity and precision using the limit of detection (LoD) and root mean-squared error of prediction (RMSEP) metrics, respectively. Gaussian kernel regression yields the superlative predictive model with an RMSEP of …