Open Access. Powered by Scholars. Published by Universities.®
- Institution
-
- Dartmouth College (4)
- Virginia Commonwealth University (3)
- City University of New York (CUNY) (2)
- New Jersey Institute of Technology (2)
- University of Kentucky (2)
-
- University of Louisville (2)
- West Virginia University (2)
- Western University (2)
- Air Force Institute of Technology (1)
- Chapman University (1)
- Clemson University (1)
- Colby College (1)
- East Tennessee State University (1)
- Embry-Riddle Aeronautical University (1)
- Kennesaw State University (1)
- Louisiana State University (1)
- Missouri State University (1)
- The Texas Medical Center Library (1)
- The University of Southern Mississippi (1)
- University of Massachusetts Amherst (1)
- University of Montana (1)
- University of New Mexico (1)
- University of Tennessee, Knoxville (1)
- Wilfrid Laurier University (1)
- Publication
-
- Theses and Dissertations (4)
- Computer Science Senior Theses (3)
- Electronic Theses and Dissertations (3)
- Dissertations (2)
- Doctoral Dissertations (2)
-
- Electronic Thesis and Dissertation Repository (2)
- Graduate Theses, Dissertations, and Problem Reports (2)
- Theses and Dissertations--Computer Science (2)
- All Theses (1)
- Computational and Data Sciences (PhD) Dissertations (1)
- Dartmouth College Undergraduate Theses (1)
- Dissertations & Theses (Open Access) (1)
- Dissertations and Theses (1)
- Dissertations, Theses, and Capstone Projects (1)
- Doctoral Dissertations and Master's Theses (1)
- Electrical and Computer Engineering ETDs (1)
- Graduate Student Theses, Dissertations, & Professional Papers (1)
- Honors Theses (1)
- LSU Doctoral Dissertations (1)
- MSU Graduate Theses (1)
- Master of Science in Computer Science Theses (1)
- Theses (1)
- Theses and Dissertations (Comprehensive) (1)
Articles 1 - 30 of 35
Full-Text Articles in Data Science
Adaptive Multi-Label Classification On Drifting Data Streams, Martha Roseberry
Adaptive Multi-Label Classification On Drifting Data Streams, Martha Roseberry
Theses and Dissertations
Drifting data streams and multi-label data are both challenging problems. When multi-label data arrives as a stream, the challenges of both problems must be addressed along with additional challenges unique to the combined problem. Algorithms must be fast and flexible, able to match both the speed and evolving nature of the stream. We propose four methods for learning from multi-label drifting data streams. First, a multi-label k Nearest Neighbors with Self Adjusting Memory (ML-SAM-kNN) exploits short- and long-term memories to predict the current and evolving states of the data stream. Second, a punitive k nearest neighbors algorithm with a self-adjusting …
Learning Mortality Risk For Covid-19 Using Machine Learning And Statistical Methods, Shaoshi Zhang
Learning Mortality Risk For Covid-19 Using Machine Learning And Statistical Methods, Shaoshi Zhang
Electronic Thesis and Dissertation Repository
This research investigates the mortality risk of COVID-19 patients across different variant waves, using the data from Centers for Disease Control and Prevention (CDC) websites. By analyzing the available data, including patient medical records, vaccination rates, and hospital capacities, we aim to discern patterns and factors associated with COVID-19-related deaths.
To explore features linked to COVID-19 mortality, we employ different techniques such as Filter, Wrapper, and Embedded methods for feature selection. Furthermore, we apply various machine learning methods, including support vector machines, decision trees, random forests, logistic regression, K-nearest neighbours, na¨ıve Bayes methods, and artificial neural networks, to uncover underlying …
Cm-Ii Meditation As An Intervention To Reduce Stress And Improve Attention: A Study Of Ml Detection, Spectral Analysis, And Hrv Metrics, Sreekanth Gopi
Cm-Ii Meditation As An Intervention To Reduce Stress And Improve Attention: A Study Of Ml Detection, Spectral Analysis, And Hrv Metrics, Sreekanth Gopi
Master of Science in Computer Science Theses
Students frequently face heightened stress due to academic and social pressures, particularly in de- manding fields like computer science and engineering. These challenges are often associated with serious mental health issues, including ADHD (Attention Deficit Hyperactivity Disorder), depression, and an increased risk of suicide. The average student attention span has notably decreased from 21⁄2 minutes to just 47 seconds, and now it typically takes about 25 minutes to switch attention to a new task (Mark, 2023). Research findings suggest that over 95% of individuals who die by suicide have been diagnosed with depression (Shahtahmasebi, 2013), and almost 20% of students …
Exact Models, Heuristics, And Supervised Learning Approaches For Vehicle Routing Problems, Zefeng Lyu
Exact Models, Heuristics, And Supervised Learning Approaches For Vehicle Routing Problems, Zefeng Lyu
Doctoral Dissertations
This dissertation presents contributions to the field of vehicle routing problems by utilizing exact methods, heuristic approaches, and the integration of machine learning with traditional algorithms. The research is organized into three main chapters, each dedicated to a specific routing problem and a unique methodology. The first chapter addresses the Pickup and Delivery Problem with Transshipments and Time Windows, a variant that permits product transfers between vehicles to enhance logistics flexibility and reduce costs. To solve this problem, we propose an efficient mixed-integer linear programming model that has been shown to outperform existing ones. The second chapter discusses a practical …
Towards Robust Long-Form Text Generation Systems, Kalpesh Krishna
Towards Robust Long-Form Text Generation Systems, Kalpesh Krishna
Doctoral Dissertations
Text generation is an important emerging AI technology that has seen significant research advances in recent years. Due to its closeness to how humans communicate, mastering text generation technology can unlock several important applications such as intelligent chat-bots, creative writing assistance, or newer applications like task-agnostic few-shot learning. Most recently, the rapid scaling of large language models (LLMs) has resulted in systems like ChatGPT, capable of generating fluent, coherent and human-like text. However, despite their remarkable capabilities, LLMs still suffer from several limitations, particularly when generating long-form text. In particular, (1) long-form generated text is filled with factual inconsistencies to …
Spoken Language Processing And Modeling For Aviation Communications, Aaron Van De Brook
Spoken Language Processing And Modeling For Aviation Communications, Aaron Van De Brook
Doctoral Dissertations and Master's Theses
With recent advances in machine learning and deep learning technologies and the creation of larger aviation-specific corpora, applying natural language processing technologies, especially those based on transformer neural networks, to aviation communications is becoming increasingly feasible. Previous work has focused on machine learning applications to natural language processing, such as N-grams and word lattices. This thesis experiments with a process for pretraining transformer-based language models on aviation English corpora and compare the effectiveness and performance of language models transfer learned from pretrained checkpoints and those trained from their base weight initializations (trained from scratch). The results suggest that transformer language …
Data-Optimized Spatial Field Predictions For Robotic Adaptive Sampling: A Gaussian Process Approach, Zachary Nathan
Data-Optimized Spatial Field Predictions For Robotic Adaptive Sampling: A Gaussian Process Approach, Zachary Nathan
Computer Science Senior Theses
We introduce a framework that combines Gaussian Process models, robotic sensor measurements, and sampling data to predict spatial fields. In this context, a spatial field refers to the distribution of a variable throughout a specific area, such as temperature or pH variations over the surface of a lake. Whereas existing methods tend to analyze only the particular field(s) of interest, our approach optimizes predictions through the effective use of all available data. We validated our framework on several datasets, showing that errors can decline by up to two-thirds through the inclusion of additional colocated measurements. In support of adaptive sampling, …
The Basil Technique: Bias Adaptive Statistical Inference Learning Agents For Learning From Human Feedback, Jonathan Indigo Watson
The Basil Technique: Bias Adaptive Statistical Inference Learning Agents For Learning From Human Feedback, Jonathan Indigo Watson
Theses and Dissertations--Computer Science
We introduce a novel approach for learning behaviors using human-provided feedback that is subject to systematic bias. Our method, known as BASIL, models the feedback signal as a combination of a heuristic evaluation of an action's utility and a probabilistically-drawn bias value, characterized by unknown parameters. We present both the general framework for our technique and specific algorithms for biases drawn from a normal distribution. We evaluate our approach across various environments and tasks, comparing it to interactive and non-interactive machine learning methods, including deep learning techniques, using human trainers and a synthetic oracle with feedback distorted to varying degrees. …
The Interaction Of Different Primary Producers And Physical And Chemical Dynamics Of An Urban Shallow Lake, Majid Sahin
The Interaction Of Different Primary Producers And Physical And Chemical Dynamics Of An Urban Shallow Lake, Majid Sahin
Dissertations, Theses, and Capstone Projects
An artificial urban shallow lake, Prospect Park Lake (PPL), is situated on a terminal moraine in Brooklyn New York, and supplied with municipal water treated with ortho-phosphates. The constant input of the phosphate nutrient is the primary source of eutrophication in the lake. The numerous pools along the water course houses various aquatic phototrophs, which influence the water quality and the state of the system, driving conditions into favoring the survival of their species. In the first half of the dissertation, the focus of the project is on analyzing how the different primary producers in different regions of PPL affect …
Data Collection And Machine Learning Methods For Automated Pedestrian Facility Detection And Mensuration, Joseph Bailey Luttrell Iv
Data Collection And Machine Learning Methods For Automated Pedestrian Facility Detection And Mensuration, Joseph Bailey Luttrell Iv
Dissertations
Large-scale collection of pedestrian facility (crosswalks, sidewalks, etc.) presence data is vital to the success of efforts to improve pedestrian facility management, safety analysis, and road network planning. However, this kind of data is typically not available on a large scale due to the high labor and time costs that are the result of relying on manual data collection methods. Therefore, methods for automating this process using techniques such as machine learning are currently being explored by researchers. In our work, we mainly focus on machine learning methods for the detection of crosswalks and sidewalks from both aerial and street-view …
Leveraging Context Patterns For Medical Entity Classification, Garrett Johnston
Leveraging Context Patterns For Medical Entity Classification, Garrett Johnston
Computer Science Senior Theses
The ability of patients to understand health-related text is important for optimal health outcomes. A system that can automatically annotate medical entities could help patients better understand health-related text. Such a system would also accelerate manual data annotation for this low-resource domain as well as assist in down- stream medical NLP tasks such as finding textual similarity, identifying conflicting medical advice, and aspect-based sentiment analysis. In this work, we investigate a state-of-the-art entity set expansion model, BootstrapNet, for the task of medical entity classification on a new dataset of medical advice text. We also propose EP SBERT, a simple model …
Un-Fair Trojan: Targeted Backdoor Attacks Against Model Fairness, Nicholas Furth
Un-Fair Trojan: Targeted Backdoor Attacks Against Model Fairness, Nicholas Furth
Theses
Machine learning models have been shown to be vulnerable against various backdoor and data poisoning attacks that adversely affect model behavior. Additionally, these attacks have been shown to make unfair predictions with respect to certain protected features. In federated learning, multiple local models contribute to a single global model communicating only using local gradients, the issue of attacks become more prevalent and complex. Previously published works revolve around solving these issues both individually and jointly. However, there has been little study on the effects of attacks against model fairness. Demonstrated in this work, a flexible attack, which we call Un-Fair …
Computational Approaches To Facilitate Automated Interchange Between Music And Art, Rao Hamza Ali
Computational Approaches To Facilitate Automated Interchange Between Music And Art, Rao Hamza Ali
Computational and Data Sciences (PhD) Dissertations
Recently, there has been a tremendous increase in generating and synthesizing music and art using various computational techniques. An area that is still under-researched, however, is how one medium can be converted into the other, while maintaining the overall aesthetics. Over the last few centuries, artists, composers, and scholars, have attempted to use substitute one form of art for the other: by proposing techniques where music notes are synonymous to colors, by inventing instruments that combine the aesthetics of music and visual art, and by incorporating the two media in live performances. A widely accepted computational approach, for the conversion, …
Beyond Accuracy In Machine Learning., Aneseh Alvanpour
Beyond Accuracy In Machine Learning., Aneseh Alvanpour
Electronic Theses and Dissertations
Machine Learning (ML) algorithms are widely used in our daily lives. The need to increase the accuracy of ML models has led to building increasingly powerful and complex algorithms known as black-box models which do not provide any explanations about the reasons behind their output. On the other hand, there are white-box ML models which are inherently interpretable while having lower accuracy compared to black-box models. To have a productive and practical algorithmic decision system, precise predictions may not be sufficient. The system may need to have transparency and be able to provide explanations, especially in applications with safety-critical contexts …
New Debiasing Strategies In Collaborative Filtering Recommender Systems: Modeling User Conformity, Multiple Biases, And Causality., Mariem Boujelbene
New Debiasing Strategies In Collaborative Filtering Recommender Systems: Modeling User Conformity, Multiple Biases, And Causality., Mariem Boujelbene
Electronic Theses and Dissertations
Recommender Systems are widely used to personalize the user experience in a diverse set of online applications ranging from e-commerce and education to social media and online entertainment. These State of the Art AI systems can suffer from several biases that may occur at different stages of the recommendation life-cycle. For instance, using biased data to train recommendation models may lead to several issues, such as the discrepancy between online and offline evaluation, decreasing the recommendation performance, and hurting the user experience. Bias can occur during the data collection stage where the data inherits the user-item interaction biases, such as …
Intra-Hour Solar Forecasting Using Cloud Dynamics Features Extracted From Ground-Based Infrared Sky Images, Guillermo Terrén-Serrano
Intra-Hour Solar Forecasting Using Cloud Dynamics Features Extracted From Ground-Based Infrared Sky Images, Guillermo Terrén-Serrano
Electrical and Computer Engineering ETDs
Due to the increasing use of photovoltaic systems, power grids are vulnerable to the projection of shadows from moving clouds. An intra-hour solar forecast provides power grids with the capability of automatically controlling the dispatch of energy, reducing the additional cost for a guaranteed, reliable supply of energy (i.e., energy storage). This dissertation introduces a novel sky imager consisting of a long-wave radiometric infrared camera and a visible light camera with a fisheye lens. The imager is mounted on a solar tracker to maintain the Sun in the center of the images throughout the day, reducing the scattering effect produced …
A Citizen-Science Approach For Urban Flood Risk Analysis Using Data Science And Machine Learning, Candace Agonafir
A Citizen-Science Approach For Urban Flood Risk Analysis Using Data Science And Machine Learning, Candace Agonafir
Dissertations and Theses
Street flooding is problematic in urban areas, where impervious surfaces, such as concrete, brick, and asphalt prevail, impeding the infiltration of water into the ground. During rain events, water ponds and rise to levels that cause considerable economic damage and physical harm. The main goal of this dissertation is to develop novel approaches toward the comprehension of urban flood risk using data science techniques on crowd-sourced data. This is accomplished by developing a series of data-driven models to identify flood factors of significance and localized areas of flood vulnerability in New York City (NYC). First, the infrastructural (catch basin clogs, …
Classifying Blood Glucose Levels Through Noninvasive Features, Rishi Reddy
Classifying Blood Glucose Levels Through Noninvasive Features, Rishi Reddy
Graduate Theses, Dissertations, and Problem Reports
Blood glucose monitoring is a key process in the prevention and management of certain chronic diseases, such as diabetes. Currently, glucose monitoring for those interested in their blood glucose levels are confronted with options that are primarily invasive and relatively costly. A growing topic of note is the development of non-invasive monitoring methods for blood glucose. This development holds a significant promise for improvement to the quality of life of a significant portion of the population and is overall met with great enthusiasm from the scientific community as well as commercial interest. This work aims to develop a potential pipeline …
Determining States Of Movement In Humans Using Minimally Processed Eeg Signals And Various Classification Methods, Maurice Barnett
Determining States Of Movement In Humans Using Minimally Processed Eeg Signals And Various Classification Methods, Maurice Barnett
All Theses
Electroencephalography (EEG) is a non-invasive technique used in both clinical and research settings to record neuronal signaling in the brain. The location of an EEG signal as well as the frequencies at which its neuronal constituents fire correlate with behavioral tasks, including discrete states of motor activity. Due to the number of channels and fine temporal resolution of EEG, a dense, high-dimensional dataset is collected. Transcranial direct current stimulation (tDCS) is a treatment that has been suggested to improve motor functions of Parkinson’s disease and chronic stroke patients when stimulation occurs during a motor task. tDCS is commonly administered without …
Exploratory Search With Archetype-Based Language Models, Brent D. Davis
Exploratory Search With Archetype-Based Language Models, Brent D. Davis
Electronic Thesis and Dissertation Repository
This dissertation explores how machine learning, natural language processing and information retrieval may assist the exploratory search task. Exploratory search is a search where the ideal outcome of the search is unknown, and thus the ideal language to use in a retrieval query to match it is unavailable. Three algorithms represent the contribution of this work. Archetype-based Modeling and Search provides a way to use previously identified archetypal documents relevant to an archetype to form a notion of similarity and find related documents that match the defined archetype. This is beneficial for exploratory search as it can generalize beyond standard …
Applying Deep Learning To The Ice Cream Vendor Problem: An Extension Of The Newsvendor Problem, Gaffar Solihu
Applying Deep Learning To The Ice Cream Vendor Problem: An Extension Of The Newsvendor Problem, Gaffar Solihu
Electronic Theses and Dissertations
The Newsvendor problem is a classical supply chain problem used to develop strategies for inventory optimization. The goal of the newsvendor problem is to predict the optimal order quantity of a product to meet an uncertain demand in the future, given that the demand distribution itself is known. The Ice Cream Vendor Problem extends the classical newsvendor problem to an uncertain demand with unknown distribution, albeit a distribution that is known to depend on exogenous features. The goal is thus to estimate the order quantity that minimizes the total cost when demand does not follow any known statistical distribution. The …
Exploring The Long Tail, Joseph H. Hajjar
Exploring The Long Tail, Joseph H. Hajjar
Dartmouth College Undergraduate Theses
The migration of datasets online has created a near-infinite inventory for big name retailers such as Amazon and Netflix, giving rise to recommendation systems to assist users in navigating the massive catalog. This has also allowed for the possibility of retailers storing much less popular, uncommon items which would not appear in a more traditional brick-and-mortar setting due to the cost of storage. Nevertheless, previous work has highlighted the profit potential which lies in the so-called "long tail'' of niche, unpopular items. Unfortunately, due to the limited amount of data in this subset of the inventory, recommendation systems often struggle …
Exploring The Use Of Social Media To Infer Relationships Between Demographics, Psychographics And Vaccine Hesitancy, Abhimanyu Kapur
Exploring The Use Of Social Media To Infer Relationships Between Demographics, Psychographics And Vaccine Hesitancy, Abhimanyu Kapur
Computer Science Senior Theses
The growing popularity of social media as a platform to obtain information and share one's opinions on various topics makes it a rich source of information for research. In this study, we aimed to develop a framework to infer relationships between demographic and psychographic characteristics of a user and their opinion on a specific narrative - in this case, their stance on taking the COVID-19 vaccine. Twitter was the chosen platform due to the large USA user base and easily available data. Demographic traits included Race, Age, Gender, and Human-vs-Organization Status. Psychographic traits included the Big Five personality traits (Conscientiousness, …
Machine Learning Methods For Depression Detection Using Smri And Rs-Fmri Images, Marzieh Sadat Mousavian
Machine Learning Methods For Depression Detection Using Smri And Rs-Fmri Images, Marzieh Sadat Mousavian
LSU Doctoral Dissertations
Major Depression Disorder (MDD) is a common disease throughout the world that negatively influences people’s lives. Early diagnosis of MDD is beneficial, so detecting practical biomarkers would aid clinicians in the diagnosis of MDD. Having an automated method to find biomarkers for MDD is helpful even though it is difficult. The main aim of this research is to generate a method for detecting discriminative features for MDD diagnosis based on Magnetic Resonance Imaging (MRI) data.
In this research, representational similarity analysis provides a framework to compare distributed patterns and obtain the similarity/dissimilarity of brain regions. Regions are obtained by either …
Convolutional Audio Source Separation Applied To Drum Signal Separation, Marius Orehovschi
Convolutional Audio Source Separation Applied To Drum Signal Separation, Marius Orehovschi
Honors Theses
This study examined the task of drum signal separation from full music mixes via both classical methods (Independent Component Analysis) and a combination of Time-Frequency Binary Masking and Convolutional Neural Networks. The results indicate that classical methods relying on predefined computations do not achieve any meaningful results, while convolutional neural networks can achieve imperfect but musically useful results. Furthermore, neural network performance can be improved by data augmentation via transposition – a technique that can only be applied in the context of drum signal separation.
Learning From Multi-Class Imbalanced Big Data With Apache Spark, William C. Sleeman Iv
Learning From Multi-Class Imbalanced Big Data With Apache Spark, William C. Sleeman Iv
Theses and Dissertations
With data becoming a new form of currency, its analysis has become a top priority in both academia and industry, furthering advancements in high-performance computing and machine learning. However, these large, real-world datasets come with additional complications such as noise and class overlap. Problems are magnified when with multi-class data is presented, especially since many of the popular algorithms were originally designed for binary data. Another challenge arises when the number of examples are not evenly distributed across all classes in a dataset. This often causes classifiers to favor the majority class over the minority classes, leading to undesirable results …
Identification And Classification Of Radio Pulsar Signals Using Machine Learning, Di Pang
Identification And Classification Of Radio Pulsar Signals Using Machine Learning, Di Pang
Graduate Theses, Dissertations, and Problem Reports
Automated single-pulse search approaches are necessary as ever-increasing amount of observed data makes the manual inspection impractical. Detecting radio pulsars using single-pulse searches, however, is a challenging problem for machine learning because pul- sar signals often vary significantly in brightness, width, and shape and are only detected in a small fraction of observed data.
The research work presented in this dissertation is focused on development of ma- chine learning algorithms and approaches for single-pulse searches in the time domain. Specifically, (1) We developed a two-stage single-pulse search approach, named Single- Pulse Event Group IDentification (SPEGID), which automatically identifies and clas- …
Revisiting Absolute Pose Regression, Hunter Blanton
Revisiting Absolute Pose Regression, Hunter Blanton
Theses and Dissertations--Computer Science
Images provide direct evidence for the position and orientation of the camera in space, known as camera pose. Traditionally, the problem of estimating the camera pose requires reference data for determining image correspondence and leveraging geometric relationships between features in the image. Recent advances in deep learning have led to a new class of methods that regress the pose directly from a single image.
This thesis proposes methods for absolute camera pose regression. Absolute pose regression estimates the pose of a camera from a single image as the output of a fixed computation pipeline. These methods have many practical benefits …
Reliable And Interpretable Machine Learning For Modeling Physical And Cyber Systems, Daniel L. Marino Lizarazo
Reliable And Interpretable Machine Learning For Modeling Physical And Cyber Systems, Daniel L. Marino Lizarazo
Theses and Dissertations
Over the past decade, Machine Learning (ML) research has predominantly focused on building extremely complex models in order to improve predictive performance. The idea was that performance can be improved by adding complexity to the models. This approach proved to be successful in creating models that can approximate highly complex relationships while taking advantage of large datasets. However, this approach led to extremely complex black-box models that lack reliability and are difficult to interpret. By lack of reliability, we specifically refer to the lack of consistent (unpredictable) behavior in situations outside the training data. Lack of interpretability refers to the …
Binary Black Widow Optimization Algorithm For Feature Selection Problems, Ahmed Al-Saedi
Binary Black Widow Optimization Algorithm For Feature Selection Problems, Ahmed Al-Saedi
Theses and Dissertations (Comprehensive)
This thesis addresses feature selection (FS) problems, which is a primary stage in data mining. FS is a significant pre-processing stage to enhance the performance of the process with regards to computation cost and accuracy to offer a better comprehension of stored data by removing the unnecessary and irrelevant features from the basic dataset. However, because of the size of the problem, FS is known to be very challenging and has been classified as an NP-hard problem. Traditional methods can only be used to solve small problems. Therefore, metaheuristic algorithms (MAs) are becoming powerful methods for addressing the FS problems. …