Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Institution
- Keyword
-
- Data mining (3)
- Machine learning (3)
- Natural language processing (3)
- Neural networks (2)
- Anomaly (1)
-
- Autocorrelation (1)
- BERT (1)
- Biomass (1)
- Biomedical entity linking (1)
- Bootsrapping (1)
- Classification accuracy (1)
- Climate variability (1)
- Clinical notes (1)
- Clinical research (1)
- Computed tomography (1)
- Conformal interference (1)
- Contextualized embeddings (1)
- Cumulative Distribution Function (1)
- Datasets (1)
- Deep learning (1)
- Differential equations (1)
- Discharge (1)
- Ecology (1)
- Educational Data Mining (1)
- Electrochemical activation (1)
- Electronic medical records (1)
- Explainable artificial intelligence (1)
- Feature selection (1)
- GNSS (1)
- GPS (1)
Articles 1 - 19 of 19
Full-Text Articles in Physical Sciences and Mathematics
Modern Approaches And Theoretical Extensions To The Multivariate Kolmogorov Smirnov Test, Gonzalo Hernando
Modern Approaches And Theoretical Extensions To The Multivariate Kolmogorov Smirnov Test, Gonzalo Hernando
Theses and Dissertations
Most statistical tests are fully developed for univariate data, but when inference is required for multivariate data, univariate tests risk information loss and interpretability. This research 1) derives and extends the multivariate Komolgorov Smirnov test for 2 and into m-dimensions, 2) derives small sample critical values for the KS test that are not reliant on sample size simulations or correlation between variables, 3) extends large sample estimations and current KS implementations, and 4) provides sample size and power calculations in order to enable experimental design with respect to testing for differences in distributions. Through extensive simulation, we demonstrate that our …
Improving Country Conflict And Peace Modeling: Datasets, Imputations, And Hierarchical Clustering, Benjamin D. Leiby
Improving Country Conflict And Peace Modeling: Datasets, Imputations, And Hierarchical Clustering, Benjamin D. Leiby
Theses and Dissertations
Many disparate datasets exist that provide country attributes covering political, economic, and social aspects. Unfortunately, this data often does not include all countries nor is the data complete for those countries included, as measured by the dataset’s missingness. This research addresses these dataset shortfalls in predicting country instability by considering country attributes in all aspects as well as in greater thresholds of missingness. First, a structured summary of past research is presented framed by a developed casual taxonomy and functional ontology. Additionally, a novel imputation technique for very large datasets is presented to account for moderate missingness in the expanded …
Three Case Studies Of Using Hybrid Model Machine Learning Techniques In Educational Data Mining To Improve The Classification Accuracies, Sujan Poudyal
Theses and Dissertations
A multitude of data is being produced from the increase in instructional technology, e-learning resources, and online courses. This data could be used by educators to analyze and extract useful information which could be beneficial to both instructors and students. Educational Data Mining (EDM) extracts hidden information from data contained within the educational domain. In data mining, hybrid method is the combination of various machine learning techniques. Through this dissertation, the novel use of machine learning hybrid techniques was explored in EDM using three educational case studies. First, in consideration for the importance of students’ attention, on and off-task data …
Quantifying Aboveground Biomass In A Tropical Forest Using A Lidar Waveform Weighted Allometric Model, Alejandro Rojas
Quantifying Aboveground Biomass In A Tropical Forest Using A Lidar Waveform Weighted Allometric Model, Alejandro Rojas
Theses and Dissertations
Our knowledge of the distribution and amount of terrestrial above ground biomass (AGB) has increased using lidar technology. Recent advancements in satellite lidar has enabled global mapping of forest biomass and structure. However, there are large biases in satellite lidar estimates which impacts our understanding of carbon dynamics, particularly in tropical forests.
Ni-Meister et al. (2022) developed a lidar full waveform weighted height-based allometric model which produced very good results in temperate deciduous/conifer forest in the continental US. The purpose of this study was to evaluate this biomass model in an African tropical forest using the Land Vegetation and Ice …
Neural Networks And Stochastic Differential Equations, Stephanie L. Flores
Neural Networks And Stochastic Differential Equations, Stephanie L. Flores
Theses and Dissertations
Influenced by the seminal work, “Physics Informed Neural Networks” by Raissi et al., 2017, there has been a growing interest in solving and parameter estimation of Nonlinear Partial Differential Equations (PDE) with Deep Neural networks in recent years. In fact, this has broadened the pathways and shed light on deep learning of stochastic differential equations (SDE) and stochastic PDE’s (SPDE).In this work, we intend to investigate the current approaches of solving and parameter estimation of the SDE/SPDE with deep neural networks and the possibility of extending them to obtain more accurate/stable solutions with residual systems and/or generative adversarial neural networks. …
Innovative Heuristics To Improve The Latent Dirichlet Allocation Methodology For Textual Analysis And A New Modernized Topic Modeling Approach, Jamie T. Zimmerman
Innovative Heuristics To Improve The Latent Dirichlet Allocation Methodology For Textual Analysis And A New Modernized Topic Modeling Approach, Jamie T. Zimmerman
Theses and Dissertations
Natural Language Processing is a complex method of data mining the vast trove of documents created and made available every day. Topic modeling seeks to identify the topics within textual corpora with limited human input into the process to speed analysis. Current topic modeling techniques used in Natural Language Processing have limitations in the pre-processing steps. This dissertation studies topic modeling techniques, those limitations in the pre-processing, and introduces new algorithms to gain improvements from existing topic modeling techniques while being competitive with computational complexity. This research introduces four contributions to the field of Natural Language Processing and topic modeling. …
Impact Of Climate Oscillations/Indices On Hydrological Variables In The Mississippi River Valley Alluvial Aquifer., Meena Raju
Theses and Dissertations
The Mississippi River Valley Alluvial Aquifer (MRVAA) is one of the most productive agricultural regions in the United States. The main objectives of this research are to identify long term trends and change points in hydrological variables (streamflow and rainfall), to assess the relationship between hydrological variables, and to evaluate the influence of global climate indices on hydrological variables. Non-parametric tests, MMK and Pettitt’s tests were used to analyze trend and change points. PCC and Streamflow elasticity analysis were used to analyze the relationship between streamflow and rainfall and the sensitivity of streamflow to rainfall changes. PCC and MLR analysis …
Constructing Prediction Intervals With Neural Networks: An Empirical Evaluation Of Bootstrapping And Conformal Inference Methods, Alexander N. Contarino
Constructing Prediction Intervals With Neural Networks: An Empirical Evaluation Of Bootstrapping And Conformal Inference Methods, Alexander N. Contarino
Theses and Dissertations
Artificial neural networks (ANNs) are popular tools for accomplishing many machine learning tasks, including predicting continuous outcomes. However, the general lack of confidence measures provided with ANN predictions limit their applicability, especially in military settings where accuracy is paramount. Supplementing point predictions with prediction intervals (PIs) is common for other learning algorithms, but the complex structure and training of ANNs renders constructing PIs difficult. This work provides the network design choices and inferential methods for creating better performing PIs with ANNs to enable their adaptation for military use. A two-step experiment is executed across 11 datasets, including an imaged-based dataset. …
Generalized Robust Feature Selection, Bradford L. Lott
Generalized Robust Feature Selection, Bradford L. Lott
Theses and Dissertations
Feature selection may be summarized as identifying salient features to a given response. Understanding which features affect the response enables, in the future, only collecting consequential data; hence, the feature selection algorithm may lead to saving effort spent collecting data, storage resources, as well as computational resources for making predictions. We propose a generalized approach to select the salient features of data sets. Our approach may also be applied to unsupervised datasets to understand which data streams provide unique information. We contend our approach identifies salient features robust to the sub-sequent predictive model applied. The proposed algorithm considers all provided …
Telemetry Data Mining For Unmanned Aircraft Systems, Li Yu
Telemetry Data Mining For Unmanned Aircraft Systems, Li Yu
Theses and Dissertations
With ever more data becoming available to the US Air Force, it is vital to develop effective methods to leverage this strategic asset. Machine learning (ML) techniques present a means of meeting this challenge, as these tools have demonstrated successful use in commercial applications. For this research, three ML methods were applied to a unmanned aircraft system (UAS) telemetry dataset with the aim of extracting useful insight related to phases of flight. It was shown that ML provides an advantage in exploratory data analysis and as well as classification of phases. Neural network models demonstrated the best performance with over …
Leveraging Machine Learning For Large Scale Analysis Of Publicly-Available Data For Gnss Interference Events, David K. Stamper
Leveraging Machine Learning For Large Scale Analysis Of Publicly-Available Data For Gnss Interference Events, David K. Stamper
Theses and Dissertations
This research documents architecture and implementation of an enhanced interference detection and classification analysis system, using both a database and storage solution utilizing machine learning algorithms to detect changes in Carrier-to-Noise strength over multiple GNSS sites. The system uses publicly-available government supported receivers to detect interference, and built using FOSS packaged as a programming library through Python. Two algorithms are discussed in terms of enhancing interference detection using both non-machine learning and machine learning approaches. Two algorithms are also discussed which are used for classification of events. In addition, an approach to Large Scale data analytics is demonstrated via a …
Transition Metal Phosphides For High Performance Electrochemical Energy Storage Devices, Amina Saleh
Transition Metal Phosphides For High Performance Electrochemical Energy Storage Devices, Amina Saleh
Theses and Dissertations
Electrochemical energy storage technologies are nowadays playing a leading role in the global effort to address the energy challenges. A lot of attention has been devoted to designing hybrid devices known as supercapatteries which combine the merits of supercapacitors (high power density) and rechargeable batteries (high energy density). Transition metal phosphides (TMP) are a rising star for supercapattery anode materials thanks to their high conductivity, metalloid characteristics, and kinetic favorability for fast electron transport. Herein, new TMP-based materials were synthesized for use as supercapattery positive electrodes, via a multifaceted approach to yield devices enjoying concurrently high power and energy densities. …
Multi-Modality Automatic Lung Tumor Segmentation Method Using Deep Learning And Radiomics, Siqiu Wang
Multi-Modality Automatic Lung Tumor Segmentation Method Using Deep Learning And Radiomics, Siqiu Wang
Theses and Dissertations
Delineation of the tumor volume is the initial and fundamental step in the radiotherapy planning process. The current clinical practice of manual delineation is time-consuming and suffers from observer variability. This work seeks to develop an effective automatic framework to produce clinically usable lung tumor segmentations. First, to facilitate the development and validation of our methodology, an expansive database of planning CTs, diagnostic PETs, and manual tumor segmentations was curated, and an image registration and preprocessing pipeline was established. Then a deep learning neural network was constructed and optimized to utilize dual-modality PET and CT images for lung tumor segmentation. …
A Study On Developing Novel Methods For Relation Extraction, Darshini Mahendran
A Study On Developing Novel Methods For Relation Extraction, Darshini Mahendran
Theses and Dissertations
Relation Extraction (RE) is a task of Natural Language Processing (NLP) to detect and classify the relations between two entities. Relation extraction in the biomedical and scientific literature domain is challenging as text can contain multiple pairs of entities in the same instance. During the course of this research, we developed an RE framework (RelEx), which consists of five main RE paradigms: rule-based, machine learning-based, Convolutional Neural Network (CNN)-based, Bidirectional Encoder Representations from Transformers (BERT)-based, and Graph Convolutional Networks (GCNs)-based approaches. RelEx's rule-based approach uses co-location information of the entities to determine whether a relation exists between a selected entity …
Temporal Disambiguation Of Relative Temporal Expressions In Clinical Texts Using Temporally Fine-Tuned Contextual Word Embeddings., Amy L. Olex
Theses and Dissertations
Temporal reasoning is the ability to extract and assimilate temporal information to reconstruct a series of events such that they can be reasoned over to answer questions involving time. Temporal reasoning in the clinical domain is challenging due to specialized medical terms and nomenclature, shorthand notation, fragmented text, a variety of writing styles used by different medical units, redundancy of information that has to be reconciled, and an increased number of temporal references as compared to general domain texts. Work in the area of clinical temporal reasoning has progressed, but the current state-of-the-art still has a ways to go before …
Estimating Weighted Panel Sizes For Primary Care Providers: An Assessment Of Clustering And Novel Methods Of Panel Size Estimation On Electronic Medical Records, Martin A. Lavallee
Estimating Weighted Panel Sizes For Primary Care Providers: An Assessment Of Clustering And Novel Methods Of Panel Size Estimation On Electronic Medical Records, Martin A. Lavallee
Theses and Dissertations
Primary Care is on the frontlines of healthcare, thus they see the most diverse set of patients. In order to achieve high functioning primary care, a practice must establish empanelment, the pairing of patients to providers. Enumeration of empanelment, or estimating panel sizes, helps ensure that the demands of the patients demand the supply of providers and optimize the balance of primary care resources to improve quality of care. Further we can adjust panel sizes by using patient-level data on healthcare utilization and complexity extracted from the electronic medial record to determine the amount of care or burden of work …
Incorporating Ontological Information In Biomedical Entity Linking Of Phrases In Clinical Text, Evan French
Incorporating Ontological Information In Biomedical Entity Linking Of Phrases In Clinical Text, Evan French
Theses and Dissertations
Biomedical Entity Linking (BEL) is the task of mapping spans of text within biomedical documents to normalized, unique identifiers within an ontology. Translational application of BEL on clinical notes has enormous potential for augmenting discretely captured data in electronic health records, but the existing paradigm for evaluating BEL systems developed in academia is not well aligned with real-world use cases. In this work, we demonstrate a proof of concept for incorporating ontological similarity into the training and evaluation of BEL systems to begin to rectify this misalignment. This thesis has two primary components: 1) a comprehensive literature review and 2) …
Universal Design In Bci: Deep Learning Approaches For Adaptive Speech Brain-Computer Interfaces, Srdjan Lesaja
Universal Design In Bci: Deep Learning Approaches For Adaptive Speech Brain-Computer Interfaces, Srdjan Lesaja
Theses and Dissertations
In the last two decades, there have been many breakthrough advancements in non-invasive and invasive brain-computer interface (BCI) systems. However, the majority of BCI model designs still follow a paradigm whereby neural signals are preprocessed and task-related features extracted using static, and generally customized, data-independent designs. Such BCI designs commonly optimize narrow task performance over generalizability, adaptability, and robustness, which is not well suited to meeting individual user needs. If one day BCIs are to be capable of decoding our higher-order cognitive commands and conceptual maps, their designs will need to be adaptive architectures that will evolve and grow in …
Computational Analysis Of Drug Targets And Prediction Of Protein-Compound Interactions, Sina Ghadermarzi
Computational Analysis Of Drug Targets And Prediction Of Protein-Compound Interactions, Sina Ghadermarzi
Theses and Dissertations
Computational prediction of compound-protein interactions generated a substantial amount of interest in the recent years owing to the importance of the knowledge of these interaction for drug discovery and drug repurposing efforts. Research suggests that the currently known drug targets constitute only a fraction of a complete set of drug targets, limiting our ability to identify suitable targets to develop new drugs or to repurpose current drugs for new diseases. These efforts are further thwarted by our limited knowledge of protein-drug (and more generally protein-compound) interactions, where only a subset of drug targets is typically known for the currently used …