Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 19 of 19

Full-Text Articles in Physical Sciences and Mathematics

Modern Approaches And Theoretical Extensions To The Multivariate Kolmogorov Smirnov Test, Gonzalo Hernando Sep 2022

Modern Approaches And Theoretical Extensions To The Multivariate Kolmogorov Smirnov Test, Gonzalo Hernando

Theses and Dissertations

Most statistical tests are fully developed for univariate data, but when inference is required for multivariate data, univariate tests risk information loss and interpretability. This research 1) derives and extends the multivariate Komolgorov Smirnov test for 2 and into m-dimensions, 2) derives small sample critical values for the KS test that are not reliant on sample size simulations or correlation between variables, 3) extends large sample estimations and current KS implementations, and 4) provides sample size and power calculations in order to enable experimental design with respect to testing for differences in distributions. Through extensive simulation, we demonstrate that our …


Improving Country Conflict And Peace Modeling: Datasets, Imputations, And Hierarchical Clustering, Benjamin D. Leiby Sep 2022

Improving Country Conflict And Peace Modeling: Datasets, Imputations, And Hierarchical Clustering, Benjamin D. Leiby

Theses and Dissertations

Many disparate datasets exist that provide country attributes covering political, economic, and social aspects. Unfortunately, this data often does not include all countries nor is the data complete for those countries included, as measured by the dataset’s missingness. This research addresses these dataset shortfalls in predicting country instability by considering country attributes in all aspects as well as in greater thresholds of missingness. First, a structured summary of past research is presented framed by a developed casual taxonomy and functional ontology. Additionally, a novel imputation technique for very large datasets is presented to account for moderate missingness in the expanded …


Three Case Studies Of Using Hybrid Model Machine Learning Techniques In Educational Data Mining To Improve The Classification Accuracies, Sujan Poudyal Aug 2022

Three Case Studies Of Using Hybrid Model Machine Learning Techniques In Educational Data Mining To Improve The Classification Accuracies, Sujan Poudyal

Theses and Dissertations

A multitude of data is being produced from the increase in instructional technology, e-learning resources, and online courses. This data could be used by educators to analyze and extract useful information which could be beneficial to both instructors and students. Educational Data Mining (EDM) extracts hidden information from data contained within the educational domain. In data mining, hybrid method is the combination of various machine learning techniques. Through this dissertation, the novel use of machine learning hybrid techniques was explored in EDM using three educational case studies. First, in consideration for the importance of students’ attention, on and off-task data …


Quantifying Aboveground Biomass In A Tropical Forest Using A Lidar Waveform Weighted Allometric Model, Alejandro Rojas Aug 2022

Quantifying Aboveground Biomass In A Tropical Forest Using A Lidar Waveform Weighted Allometric Model, Alejandro Rojas

Theses and Dissertations

Our knowledge of the distribution and amount of terrestrial above ground biomass (AGB) has increased using lidar technology. Recent advancements in satellite lidar has enabled global mapping of forest biomass and structure. However, there are large biases in satellite lidar estimates which impacts our understanding of carbon dynamics, particularly in tropical forests.

Ni-Meister et al. (2022) developed a lidar full waveform weighted height-based allometric model which produced very good results in temperate deciduous/conifer forest in the continental US. The purpose of this study was to evaluate this biomass model in an African tropical forest using the Land Vegetation and Ice …


Neural Networks And Stochastic Differential Equations, Stephanie L. Flores Aug 2022

Neural Networks And Stochastic Differential Equations, Stephanie L. Flores

Theses and Dissertations

Influenced by the seminal work, “Physics Informed Neural Networks” by Raissi et al., 2017, there has been a growing interest in solving and parameter estimation of Nonlinear Partial Differential Equations (PDE) with Deep Neural networks in recent years. In fact, this has broadened the pathways and shed light on deep learning of stochastic differential equations (SDE) and stochastic PDE’s (SPDE).In this work, we intend to investigate the current approaches of solving and parameter estimation of the SDE/SPDE with deep neural networks and the possibility of extending them to obtain more accurate/stable solutions with residual systems and/or generative adversarial neural networks. …


Innovative Heuristics To Improve The Latent Dirichlet Allocation Methodology For Textual Analysis And A New Modernized Topic Modeling Approach, Jamie T. Zimmerman Jun 2022

Innovative Heuristics To Improve The Latent Dirichlet Allocation Methodology For Textual Analysis And A New Modernized Topic Modeling Approach, Jamie T. Zimmerman

Theses and Dissertations

Natural Language Processing is a complex method of data mining the vast trove of documents created and made available every day. Topic modeling seeks to identify the topics within textual corpora with limited human input into the process to speed analysis. Current topic modeling techniques used in Natural Language Processing have limitations in the pre-processing steps. This dissertation studies topic modeling techniques, those limitations in the pre-processing, and introduces new algorithms to gain improvements from existing topic modeling techniques while being competitive with computational complexity. This research introduces four contributions to the field of Natural Language Processing and topic modeling. …


Impact Of Climate Oscillations/Indices On Hydrological Variables In The Mississippi River Valley Alluvial Aquifer., Meena Raju May 2022

Impact Of Climate Oscillations/Indices On Hydrological Variables In The Mississippi River Valley Alluvial Aquifer., Meena Raju

Theses and Dissertations

The Mississippi River Valley Alluvial Aquifer (MRVAA) is one of the most productive agricultural regions in the United States. The main objectives of this research are to identify long term trends and change points in hydrological variables (streamflow and rainfall), to assess the relationship between hydrological variables, and to evaluate the influence of global climate indices on hydrological variables. Non-parametric tests, MMK and Pettitt’s tests were used to analyze trend and change points. PCC and Streamflow elasticity analysis were used to analyze the relationship between streamflow and rainfall and the sensitivity of streamflow to rainfall changes. PCC and MLR analysis …


Constructing Prediction Intervals With Neural Networks: An Empirical Evaluation Of Bootstrapping And Conformal Inference Methods, Alexander N. Contarino Mar 2022

Constructing Prediction Intervals With Neural Networks: An Empirical Evaluation Of Bootstrapping And Conformal Inference Methods, Alexander N. Contarino

Theses and Dissertations

Artificial neural networks (ANNs) are popular tools for accomplishing many machine learning tasks, including predicting continuous outcomes. However, the general lack of confidence measures provided with ANN predictions limit their applicability, especially in military settings where accuracy is paramount. Supplementing point predictions with prediction intervals (PIs) is common for other learning algorithms, but the complex structure and training of ANNs renders constructing PIs difficult. This work provides the network design choices and inferential methods for creating better performing PIs with ANNs to enable their adaptation for military use. A two-step experiment is executed across 11 datasets, including an imaged-based dataset. …


Generalized Robust Feature Selection, Bradford L. Lott Mar 2022

Generalized Robust Feature Selection, Bradford L. Lott

Theses and Dissertations

Feature selection may be summarized as identifying salient features to a given response. Understanding which features affect the response enables, in the future, only collecting consequential data; hence, the feature selection algorithm may lead to saving effort spent collecting data, storage resources, as well as computational resources for making predictions. We propose a generalized approach to select the salient features of data sets. Our approach may also be applied to unsupervised datasets to understand which data streams provide unique information. We contend our approach identifies salient features robust to the sub-sequent predictive model applied. The proposed algorithm considers all provided …


Telemetry Data Mining For Unmanned Aircraft Systems, Li Yu Mar 2022

Telemetry Data Mining For Unmanned Aircraft Systems, Li Yu

Theses and Dissertations

With ever more data becoming available to the US Air Force, it is vital to develop effective methods to leverage this strategic asset. Machine learning (ML) techniques present a means of meeting this challenge, as these tools have demonstrated successful use in commercial applications. For this research, three ML methods were applied to a unmanned aircraft system (UAS) telemetry dataset with the aim of extracting useful insight related to phases of flight. It was shown that ML provides an advantage in exploratory data analysis and as well as classification of phases. Neural network models demonstrated the best performance with over …


Leveraging Machine Learning For Large Scale Analysis Of Publicly-Available Data For Gnss Interference Events, David K. Stamper Mar 2022

Leveraging Machine Learning For Large Scale Analysis Of Publicly-Available Data For Gnss Interference Events, David K. Stamper

Theses and Dissertations

This research documents architecture and implementation of an enhanced interference detection and classification analysis system, using both a database and storage solution utilizing machine learning algorithms to detect changes in Carrier-to-Noise strength over multiple GNSS sites. The system uses publicly-available government supported receivers to detect interference, and built using FOSS packaged as a programming library through Python. Two algorithms are discussed in terms of enhancing interference detection using both non-machine learning and machine learning approaches. Two algorithms are also discussed which are used for classification of events. In addition, an approach to Large Scale data analytics is demonstrated via a …


Transition Metal Phosphides For High Performance Electrochemical Energy Storage Devices, Amina Saleh Jan 2022

Transition Metal Phosphides For High Performance Electrochemical Energy Storage Devices, Amina Saleh

Theses and Dissertations

Electrochemical energy storage technologies are nowadays playing a leading role in the global effort to address the energy challenges. A lot of attention has been devoted to designing hybrid devices known as supercapatteries which combine the merits of supercapacitors (high power density) and rechargeable batteries (high energy density). Transition metal phosphides (TMP) are a rising star for supercapattery anode materials thanks to their high conductivity, metalloid characteristics, and kinetic favorability for fast electron transport. Herein, new TMP-based materials were synthesized for use as supercapattery positive electrodes, via a multifaceted approach to yield devices enjoying concurrently high power and energy densities. …


Multi-Modality Automatic Lung Tumor Segmentation Method Using Deep Learning And Radiomics, Siqiu Wang Jan 2022

Multi-Modality Automatic Lung Tumor Segmentation Method Using Deep Learning And Radiomics, Siqiu Wang

Theses and Dissertations

Delineation of the tumor volume is the initial and fundamental step in the radiotherapy planning process. The current clinical practice of manual delineation is time-consuming and suffers from observer variability. This work seeks to develop an effective automatic framework to produce clinically usable lung tumor segmentations. First, to facilitate the development and validation of our methodology, an expansive database of planning CTs, diagnostic PETs, and manual tumor segmentations was curated, and an image registration and preprocessing pipeline was established. Then a deep learning neural network was constructed and optimized to utilize dual-modality PET and CT images for lung tumor segmentation. …


A Study On Developing Novel Methods For Relation Extraction, Darshini Mahendran Jan 2022

A Study On Developing Novel Methods For Relation Extraction, Darshini Mahendran

Theses and Dissertations

Relation Extraction (RE) is a task of Natural Language Processing (NLP) to detect and classify the relations between two entities. Relation extraction in the biomedical and scientific literature domain is challenging as text can contain multiple pairs of entities in the same instance. During the course of this research, we developed an RE framework (RelEx), which consists of five main RE paradigms: rule-based, machine learning-based, Convolutional Neural Network (CNN)-based, Bidirectional Encoder Representations from Transformers (BERT)-based, and Graph Convolutional Networks (GCNs)-based approaches. RelEx's rule-based approach uses co-location information of the entities to determine whether a relation exists between a selected entity …


Temporal Disambiguation Of Relative Temporal Expressions In Clinical Texts Using Temporally Fine-Tuned Contextual Word Embeddings., Amy L. Olex Jan 2022

Temporal Disambiguation Of Relative Temporal Expressions In Clinical Texts Using Temporally Fine-Tuned Contextual Word Embeddings., Amy L. Olex

Theses and Dissertations

Temporal reasoning is the ability to extract and assimilate temporal information to reconstruct a series of events such that they can be reasoned over to answer questions involving time. Temporal reasoning in the clinical domain is challenging due to specialized medical terms and nomenclature, shorthand notation, fragmented text, a variety of writing styles used by different medical units, redundancy of information that has to be reconciled, and an increased number of temporal references as compared to general domain texts. Work in the area of clinical temporal reasoning has progressed, but the current state-of-the-art still has a ways to go before …


Estimating Weighted Panel Sizes For Primary Care Providers: An Assessment Of Clustering And Novel Methods Of Panel Size Estimation On Electronic Medical Records, Martin A. Lavallee Jan 2022

Estimating Weighted Panel Sizes For Primary Care Providers: An Assessment Of Clustering And Novel Methods Of Panel Size Estimation On Electronic Medical Records, Martin A. Lavallee

Theses and Dissertations

Primary Care is on the frontlines of healthcare, thus they see the most diverse set of patients. In order to achieve high functioning primary care, a practice must establish empanelment, the pairing of patients to providers. Enumeration of empanelment, or estimating panel sizes, helps ensure that the demands of the patients demand the supply of providers and optimize the balance of primary care resources to improve quality of care. Further we can adjust panel sizes by using patient-level data on healthcare utilization and complexity extracted from the electronic medial record to determine the amount of care or burden of work …


Incorporating Ontological Information In Biomedical Entity Linking Of Phrases In Clinical Text, Evan French Jan 2022

Incorporating Ontological Information In Biomedical Entity Linking Of Phrases In Clinical Text, Evan French

Theses and Dissertations

Biomedical Entity Linking (BEL) is the task of mapping spans of text within biomedical documents to normalized, unique identifiers within an ontology. Translational application of BEL on clinical notes has enormous potential for augmenting discretely captured data in electronic health records, but the existing paradigm for evaluating BEL systems developed in academia is not well aligned with real-world use cases. In this work, we demonstrate a proof of concept for incorporating ontological similarity into the training and evaluation of BEL systems to begin to rectify this misalignment. This thesis has two primary components: 1) a comprehensive literature review and 2) …


Universal Design In Bci: Deep Learning Approaches For Adaptive Speech Brain-Computer Interfaces, Srdjan Lesaja Jan 2022

Universal Design In Bci: Deep Learning Approaches For Adaptive Speech Brain-Computer Interfaces, Srdjan Lesaja

Theses and Dissertations

In the last two decades, there have been many breakthrough advancements in non-invasive and invasive brain-computer interface (BCI) systems. However, the majority of BCI model designs still follow a paradigm whereby neural signals are preprocessed and task-related features extracted using static, and generally customized, data-independent designs. Such BCI designs commonly optimize narrow task performance over generalizability, adaptability, and robustness, which is not well suited to meeting individual user needs. If one day BCIs are to be capable of decoding our higher-order cognitive commands and conceptual maps, their designs will need to be adaptive architectures that will evolve and grow in …


Computational Analysis Of Drug Targets And Prediction Of Protein-Compound Interactions, Sina Ghadermarzi Jan 2022

Computational Analysis Of Drug Targets And Prediction Of Protein-Compound Interactions, Sina Ghadermarzi

Theses and Dissertations

Computational prediction of compound-protein interactions generated a substantial amount of interest in the recent years owing to the importance of the knowledge of these interaction for drug discovery and drug repurposing efforts. Research suggests that the currently known drug targets constitute only a fraction of a complete set of drug targets, limiting our ability to identify suitable targets to develop new drugs or to repurpose current drugs for new diseases. These efforts are further thwarted by our limited knowledge of protein-drug (and more generally protein-compound) interactions, where only a subset of drug targets is typically known for the currently used …