Open Access. Powered by Scholars. Published by Universities.®

Data Science Commons

Open Access. Powered by Scholars. Published by Universities.®

Virginia Commonwealth University

Discipline
Keyword
Publication Year
Publication
Publication Type

Articles 1 - 20 of 20

Full-Text Articles in Data Science

Adaptive Multi-Label Classification On Drifting Data Streams, Martha Roseberry Jan 2024

Adaptive Multi-Label Classification On Drifting Data Streams, Martha Roseberry

Theses and Dissertations

Drifting data streams and multi-label data are both challenging problems. When multi-label data arrives as a stream, the challenges of both problems must be addressed along with additional challenges unique to the combined problem. Algorithms must be fast and flexible, able to match both the speed and evolving nature of the stream. We propose four methods for learning from multi-label drifting data streams. First, a multi-label k Nearest Neighbors with Self Adjusting Memory (ML-SAM-kNN) exploits short- and long-term memories to predict the current and evolving states of the data stream. Second, a punitive k nearest neighbors algorithm with a self-adjusting …


Analytical Approach For Monitoring The Behavior Of Patients With Pancreatic Adenocarcinoma At Different Stages As A Function Of Time, Aditya Chakaborty Dr, Chris P. Tsokos Dr May 2023

Analytical Approach For Monitoring The Behavior Of Patients With Pancreatic Adenocarcinoma At Different Stages As A Function Of Time, Aditya Chakaborty Dr, Chris P. Tsokos Dr

Biology and Medicine Through Mathematics Conference

No abstract provided.


Face Anti-Spoofing And Deep Learning Based Unsupervised Image Recognition Systems, Enoch Solomon Jan 2023

Face Anti-Spoofing And Deep Learning Based Unsupervised Image Recognition Systems, Enoch Solomon

Theses and Dissertations

One of the main problems of a supervised deep learning approach is that it requires large amounts of labeled training data, which are not always easily available. This PhD dissertation addresses the above-mentioned problem by using a novel unsupervised deep learning face verification system called UFace, that does not require labeled training data as it automatically, in an unsupervised way, generates training data from even a relatively small size of data. The method starts by selecting, in unsupervised way, k-most similar and k-most dissimilar images for a given face image. Moreover, this PhD dissertation proposes a new loss function to …


Inferring Dynamics Of Biological Systems, Tracey G. Oellerich May 2022

Inferring Dynamics Of Biological Systems, Tracey G. Oellerich

Biology and Medicine Through Mathematics Conference

No abstract provided.


Multi-Modality Automatic Lung Tumor Segmentation Method Using Deep Learning And Radiomics, Siqiu Wang Jan 2022

Multi-Modality Automatic Lung Tumor Segmentation Method Using Deep Learning And Radiomics, Siqiu Wang

Theses and Dissertations

Delineation of the tumor volume is the initial and fundamental step in the radiotherapy planning process. The current clinical practice of manual delineation is time-consuming and suffers from observer variability. This work seeks to develop an effective automatic framework to produce clinically usable lung tumor segmentations. First, to facilitate the development and validation of our methodology, an expansive database of planning CTs, diagnostic PETs, and manual tumor segmentations was curated, and an image registration and preprocessing pipeline was established. Then a deep learning neural network was constructed and optimized to utilize dual-modality PET and CT images for lung tumor segmentation. …


Incorporating Ontological Information In Biomedical Entity Linking Of Phrases In Clinical Text, Evan French Jan 2022

Incorporating Ontological Information In Biomedical Entity Linking Of Phrases In Clinical Text, Evan French

Theses and Dissertations

Biomedical Entity Linking (BEL) is the task of mapping spans of text within biomedical documents to normalized, unique identifiers within an ontology. Translational application of BEL on clinical notes has enormous potential for augmenting discretely captured data in electronic health records, but the existing paradigm for evaluating BEL systems developed in academia is not well aligned with real-world use cases. In this work, we demonstrate a proof of concept for incorporating ontological similarity into the training and evaluation of BEL systems to begin to rectify this misalignment. This thesis has two primary components: 1) a comprehensive literature review and 2) …


Estimating Weighted Panel Sizes For Primary Care Providers: An Assessment Of Clustering And Novel Methods Of Panel Size Estimation On Electronic Medical Records, Martin A. Lavallee Jan 2022

Estimating Weighted Panel Sizes For Primary Care Providers: An Assessment Of Clustering And Novel Methods Of Panel Size Estimation On Electronic Medical Records, Martin A. Lavallee

Theses and Dissertations

Primary Care is on the frontlines of healthcare, thus they see the most diverse set of patients. In order to achieve high functioning primary care, a practice must establish empanelment, the pairing of patients to providers. Enumeration of empanelment, or estimating panel sizes, helps ensure that the demands of the patients demand the supply of providers and optimize the balance of primary care resources to improve quality of care. Further we can adjust panel sizes by using patient-level data on healthcare utilization and complexity extracted from the electronic medial record to determine the amount of care or burden of work …


Universal Design In Bci: Deep Learning Approaches For Adaptive Speech Brain-Computer Interfaces, Srdjan Lesaja Jan 2022

Universal Design In Bci: Deep Learning Approaches For Adaptive Speech Brain-Computer Interfaces, Srdjan Lesaja

Theses and Dissertations

In the last two decades, there have been many breakthrough advancements in non-invasive and invasive brain-computer interface (BCI) systems. However, the majority of BCI model designs still follow a paradigm whereby neural signals are preprocessed and task-related features extracted using static, and generally customized, data-independent designs. Such BCI designs commonly optimize narrow task performance over generalizability, adaptability, and robustness, which is not well suited to meeting individual user needs. If one day BCIs are to be capable of decoding our higher-order cognitive commands and conceptual maps, their designs will need to be adaptive architectures that will evolve and grow in …


A Study On Developing Novel Methods For Relation Extraction, Darshini Mahendran Jan 2022

A Study On Developing Novel Methods For Relation Extraction, Darshini Mahendran

Theses and Dissertations

Relation Extraction (RE) is a task of Natural Language Processing (NLP) to detect and classify the relations between two entities. Relation extraction in the biomedical and scientific literature domain is challenging as text can contain multiple pairs of entities in the same instance. During the course of this research, we developed an RE framework (RelEx), which consists of five main RE paradigms: rule-based, machine learning-based, Convolutional Neural Network (CNN)-based, Bidirectional Encoder Representations from Transformers (BERT)-based, and Graph Convolutional Networks (GCNs)-based approaches. RelEx's rule-based approach uses co-location information of the entities to determine whether a relation exists between a selected entity …


Temporal Disambiguation Of Relative Temporal Expressions In Clinical Texts Using Temporally Fine-Tuned Contextual Word Embeddings., Amy L. Olex Jan 2022

Temporal Disambiguation Of Relative Temporal Expressions In Clinical Texts Using Temporally Fine-Tuned Contextual Word Embeddings., Amy L. Olex

Theses and Dissertations

Temporal reasoning is the ability to extract and assimilate temporal information to reconstruct a series of events such that they can be reasoned over to answer questions involving time. Temporal reasoning in the clinical domain is challenging due to specialized medical terms and nomenclature, shorthand notation, fragmented text, a variety of writing styles used by different medical units, redundancy of information that has to be reconciled, and an increased number of temporal references as compared to general domain texts. Work in the area of clinical temporal reasoning has progressed, but the current state-of-the-art still has a ways to go before …


Computational Analysis Of Drug Targets And Prediction Of Protein-Compound Interactions, Sina Ghadermarzi Jan 2022

Computational Analysis Of Drug Targets And Prediction Of Protein-Compound Interactions, Sina Ghadermarzi

Theses and Dissertations

Computational prediction of compound-protein interactions generated a substantial amount of interest in the recent years owing to the importance of the knowledge of these interaction for drug discovery and drug repurposing efforts. Research suggests that the currently known drug targets constitute only a fraction of a complete set of drug targets, limiting our ability to identify suitable targets to develop new drugs or to repurpose current drugs for new diseases. These efforts are further thwarted by our limited knowledge of protein-drug (and more generally protein-compound) interactions, where only a subset of drug targets is typically known for the currently used …


K-Nearest Neighbors Density-Based Clustering, Avory C. Bryant Jan 2021

K-Nearest Neighbors Density-Based Clustering, Avory C. Bryant

Theses and Dissertations

Traditional density-based clustering approaches rely on a distance-based parameter to define data connectivity and density. However, an appropriate value of this parameter can be difficult to determine as it is highly dependent on the underlying distribution of the data. In particular, distribution parameters affect the scale of inter-group distances (e.g., variance); this dependence leads to a well-known inability to simultaneously detect clusters at varying levels of density. In this work, connectivity and density are defined according to the rank-order induced by the distance metric (i.e., invariant to the expected scale of the distances). Connectivity by k-nearest neighbors and density by …


Continual Learning For Multi-Label Drifting Data Streams Using Homogeneous Ensemble Of Self-Adjusting Nearest Neighbors, Gavin Alberghini Jan 2021

Continual Learning For Multi-Label Drifting Data Streams Using Homogeneous Ensemble Of Self-Adjusting Nearest Neighbors, Gavin Alberghini

Theses and Dissertations

Multi-label data streams are sequences of multi-label instances arriving over time to a multi-label classifier. The properties of the data stream may continuously change due to concept drift. Therefore, algorithms must adapt constantly to the new data distributions. In this paper we propose a novel ensemble method for multi-label drifting streams named Homogeneous Ensemble of Self-Adjusting Nearest Neighbors (HESAkNN). It leverages a self-adjusting kNN as a base classifier with the advantages of ensembles to adapt to concept drift in the multi-label environment. To promote diverse knowledge within the ensemble, each base classifier is given a unique subset of features and …


Reliable And Interpretable Machine Learning For Modeling Physical And Cyber Systems, Daniel L. Marino Lizarazo Jan 2021

Reliable And Interpretable Machine Learning For Modeling Physical And Cyber Systems, Daniel L. Marino Lizarazo

Theses and Dissertations

Over the past decade, Machine Learning (ML) research has predominantly focused on building extremely complex models in order to improve predictive performance. The idea was that performance can be improved by adding complexity to the models. This approach proved to be successful in creating models that can approximate highly complex relationships while taking advantage of large datasets. However, this approach led to extremely complex black-box models that lack reliability and are difficult to interpret. By lack of reliability, we specifically refer to the lack of consistent (unpredictable) behavior in situations outside the training data. Lack of interpretability refers to the …


Statistical Approaches For Estimation And Comparison Of Brain Functional Connectivity, Jifang Zhao Jan 2021

Statistical Approaches For Estimation And Comparison Of Brain Functional Connectivity, Jifang Zhao

Theses and Dissertations

Drug addiction can lead to many health-related problems and social concerns. Functional connectivity obtained from functional magnetic resonance imaging (fMRI) data promotes a variety of fundamental understandings in such association. Due to its complex correlation structure and large dimensionality, the modeling and analysis of the functional connectivity from neuroimage are challenging. By proposing a spatio-temporal model for multi-subject neuroimage data, we incorporate voxel-level spatio-temporal dependencies of whole-brain measurements to improve the accuracy of statistical inference. To tackle large-scale spatio-temporal neuroimage data, we develop a computationally efficient algorithm to estimate the parameters. Our method is used to identify functional connectivity and …


Improving Space Efficiency Of Deep Neural Networks, Aliakbar Panahi Jan 2021

Improving Space Efficiency Of Deep Neural Networks, Aliakbar Panahi

Theses and Dissertations

Language models employ a very large number of trainable parameters. Despite being highly overparameterized, these networks often achieve good out-of-sample test performance on the original task and easily fine-tune to related tasks. Recent observations involving, for example, intrinsic dimension of the objective landscape and the lottery ticket hypothesis, indicate that often training actively involves only a small fraction of the parameter space. Thus, a question remains how large a parameter space needs to be in the first place — the evidence from recent work on model compression, parameter sharing, factorized representations, and knowledge distillation increasingly shows that models can be …


Information Architecture For A Chemical Modeling Knowledge Graph, Adam R. Luxon Jan 2021

Information Architecture For A Chemical Modeling Knowledge Graph, Adam R. Luxon

Theses and Dissertations

Machine learning models for chemical property predictions are high dimension design challenges spanning multiple disciplines. Free and open-source software libraries have streamlined the model implementation process, but the design complexity remains. In order better navigate and understand the machine learning design space, model information needs to be organized and contextualized. In this work, instances of chemical property models and their associated parameters were stored in a Neo4j property graph database. Machine learning model instances were created with permutations of dataset, learning algorithm, molecular featurization, data scaling, data splitting, hyperparameters, and hyperparameter optimization techniques. The resulting graph contains over 83,000 nodes …


Learning From Multi-Class Imbalanced Big Data With Apache Spark, William C. Sleeman Iv Jan 2021

Learning From Multi-Class Imbalanced Big Data With Apache Spark, William C. Sleeman Iv

Theses and Dissertations

With data becoming a new form of currency, its analysis has become a top priority in both academia and industry, furthering advancements in high-performance computing and machine learning. However, these large, real-world datasets come with additional complications such as noise and class overlap. Problems are magnified when with multi-class data is presented, especially since many of the popular algorithms were originally designed for binary data. Another challenge arises when the number of examples are not evenly distributed across all classes in a dataset. This often causes classifiers to favor the majority class over the minority classes, leading to undesirable results …


Invariance And Invertibility In Deep Neural Networks, Han Zhang Jan 2020

Invariance And Invertibility In Deep Neural Networks, Han Zhang

Theses and Dissertations

Machine learning is concerned with computer systems that learn from data instead of being explicitly programmed to solve a particular task. One of the main approaches behind recent advances in machine learning involves neural networks with a large number of layers, often referred to as deep learning. In this dissertation, we study how to equip deep neural networks with two useful properties: invariance and invertibility. The first part of our work is focused on constructing neural networks that are invariant to certain transformations in the input, that is, some outputs of the network stay the same even if the input …


Untapped Potential Of Clinical Text For Opioid Surveillance, Amy L. Olex, Tamas Gal, Majid Afshar, Dmitriy Dligach, Niranjan Karnik, Travis Oakes, Brihat Sharma, Meng Xie, Bridget T. Mcinnes, Julian Solway, Abel Kho, William Cramer, F. Gerard Moeller Jan 2019

Untapped Potential Of Clinical Text For Opioid Surveillance, Amy L. Olex, Tamas Gal, Majid Afshar, Dmitriy Dligach, Niranjan Karnik, Travis Oakes, Brihat Sharma, Meng Xie, Bridget T. Mcinnes, Julian Solway, Abel Kho, William Cramer, F. Gerard Moeller

Wright Center for Clinical and Translational Research Works

Accurate surveillance is needed to combat the growing opioid epidemic. To investigate the potential volume of missed opioid overdoses, we compare overdose encounters identified by ICD-10-CM codes and an NLP pipeline from two different medical systems. Our results show that the NLP pipeline identified a larger percentage of OOD encounters than ICD-10-CM codes. Thus, incorporating sophisticated NLP techniques into current diagnostic methods has the potential to improve surveillance on the incidence of opioid overdoses.