Open Access. Powered by Scholars. Published by Universities.®

Data Science Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 14 of 14

Full-Text Articles in Data Science

Determining States Of Movement In Humans Using Minimally Processed Eeg Signals And Various Classification Methods, Maurice Barnett Dec 2021

Determining States Of Movement In Humans Using Minimally Processed Eeg Signals And Various Classification Methods, Maurice Barnett

All Theses

Electroencephalography (EEG) is a non-invasive technique used in both clinical and research settings to record neuronal signaling in the brain. The location of an EEG signal as well as the frequencies at which its neuronal constituents fire correlate with behavioral tasks, including discrete states of motor activity. Due to the number of channels and fine temporal resolution of EEG, a dense, high-dimensional dataset is collected. Transcranial direct current stimulation (tDCS) is a treatment that has been suggested to improve motor functions of Parkinson’s disease and chronic stroke patients when stimulation occurs during a motor task. tDCS is commonly administered without …


Deep Fakes: The Algorithms That Create And Detect Them And The National Security Risks They Pose, Nick Dunard Sep 2021

Deep Fakes: The Algorithms That Create And Detect Them And The National Security Risks They Pose, Nick Dunard

James Madison Undergraduate Research Journal (JMURJ)

The dissemination of deep fakes for nefarious purposes poses significant national security risks to the United States, requiring an urgent development of technologies to detect their use and strategies to mitigate their effects. Deep fakes are images and videos created by or with the assistance of AI algorithms in which a person’s likeness, actions, or words have been replaced by someone else’s to deceive an audience. Often created with the help of generative adversarial networks, deep fakes can be used to blackmail, harass, exploit, and intimidate individuals and businesses; in large-scale disinformation campaigns, they can incite political tensions around the …


Exploratory Search With Archetype-Based Language Models, Brent D. Davis Aug 2021

Exploratory Search With Archetype-Based Language Models, Brent D. Davis

Electronic Thesis and Dissertation Repository

This dissertation explores how machine learning, natural language processing and information retrieval may assist the exploratory search task. Exploratory search is a search where the ideal outcome of the search is unknown, and thus the ideal language to use in a retrieval query to match it is unavailable. Three algorithms represent the contribution of this work. Archetype-based Modeling and Search provides a way to use previously identified archetypal documents relevant to an archetype to form a notion of similarity and find related documents that match the defined archetype. This is beneficial for exploratory search as it can generalize beyond standard …


Applying Deep Learning To The Ice Cream Vendor Problem: An Extension Of The Newsvendor Problem, Gaffar Solihu Aug 2021

Applying Deep Learning To The Ice Cream Vendor Problem: An Extension Of The Newsvendor Problem, Gaffar Solihu

Electronic Theses and Dissertations

The Newsvendor problem is a classical supply chain problem used to develop strategies for inventory optimization. The goal of the newsvendor problem is to predict the optimal order quantity of a product to meet an uncertain demand in the future, given that the demand distribution itself is known. The Ice Cream Vendor Problem extends the classical newsvendor problem to an uncertain demand with unknown distribution, albeit a distribution that is known to depend on exogenous features. The goal is thus to estimate the order quantity that minimizes the total cost when demand does not follow any known statistical distribution. The …


Exploring The Long Tail, Joseph H. Hajjar Jun 2021

Exploring The Long Tail, Joseph H. Hajjar

Dartmouth College Undergraduate Theses

The migration of datasets online has created a near-infinite inventory for big name retailers such as Amazon and Netflix, giving rise to recommendation systems to assist users in navigating the massive catalog. This has also allowed for the possibility of retailers storing much less popular, uncommon items which would not appear in a more traditional brick-and-mortar setting due to the cost of storage. Nevertheless, previous work has highlighted the profit potential which lies in the so-called "long tail'' of niche, unpopular items. Unfortunately, due to the limited amount of data in this subset of the inventory, recommendation systems often struggle …


Exploring The Use Of Social Media To Infer Relationships Between Demographics, Psychographics And Vaccine Hesitancy, Abhimanyu Kapur Jun 2021

Exploring The Use Of Social Media To Infer Relationships Between Demographics, Psychographics And Vaccine Hesitancy, Abhimanyu Kapur

Computer Science Senior Theses

The growing popularity of social media as a platform to obtain information and share one's opinions on various topics makes it a rich source of information for research. In this study, we aimed to develop a framework to infer relationships between demographic and psychographic characteristics of a user and their opinion on a specific narrative - in this case, their stance on taking the COVID-19 vaccine. Twitter was the chosen platform due to the large USA user base and easily available data. Demographic traits included Race, Age, Gender, and Human-vs-Organization Status. Psychographic traits included the Big Five personality traits (Conscientiousness, …


Machine Learning Methods For Depression Detection Using Smri And Rs-Fmri Images, Marzieh Sadat Mousavian May 2021

Machine Learning Methods For Depression Detection Using Smri And Rs-Fmri Images, Marzieh Sadat Mousavian

LSU Doctoral Dissertations

Major Depression Disorder (MDD) is a common disease throughout the world that negatively influences people’s lives. Early diagnosis of MDD is beneficial, so detecting practical biomarkers would aid clinicians in the diagnosis of MDD. Having an automated method to find biomarkers for MDD is helpful even though it is difficult. The main aim of this research is to generate a method for detecting discriminative features for MDD diagnosis based on Magnetic Resonance Imaging (MRI) data.

In this research, representational similarity analysis provides a framework to compare distributed patterns and obtain the similarity/dissimilarity of brain regions. Regions are obtained by either …


Learning From Multi-Class Imbalanced Big Data With Apache Spark, William C. Sleeman Iv Jan 2021

Learning From Multi-Class Imbalanced Big Data With Apache Spark, William C. Sleeman Iv

Theses and Dissertations

With data becoming a new form of currency, its analysis has become a top priority in both academia and industry, furthering advancements in high-performance computing and machine learning. However, these large, real-world datasets come with additional complications such as noise and class overlap. Problems are magnified when with multi-class data is presented, especially since many of the popular algorithms were originally designed for binary data. Another challenge arises when the number of examples are not evenly distributed across all classes in a dataset. This often causes classifiers to favor the majority class over the minority classes, leading to undesirable results …


Identification And Classification Of Radio Pulsar Signals Using Machine Learning, Di Pang Jan 2021

Identification And Classification Of Radio Pulsar Signals Using Machine Learning, Di Pang

Graduate Theses, Dissertations, and Problem Reports

Automated single-pulse search approaches are necessary as ever-increasing amount of observed data makes the manual inspection impractical. Detecting radio pulsars using single-pulse searches, however, is a challenging problem for machine learning because pul- sar signals often vary significantly in brightness, width, and shape and are only detected in a small fraction of observed data.

The research work presented in this dissertation is focused on development of ma- chine learning algorithms and approaches for single-pulse searches in the time domain. Specifically, (1) We developed a two-stage single-pulse search approach, named Single- Pulse Event Group IDentification (SPEGID), which automatically identifies and clas- …


Revisiting Absolute Pose Regression, Hunter Blanton Jan 2021

Revisiting Absolute Pose Regression, Hunter Blanton

Theses and Dissertations--Computer Science

Images provide direct evidence for the position and orientation of the camera in space, known as camera pose. Traditionally, the problem of estimating the camera pose requires reference data for determining image correspondence and leveraging geometric relationships between features in the image. Recent advances in deep learning have led to a new class of methods that regress the pose directly from a single image.

This thesis proposes methods for absolute camera pose regression. Absolute pose regression estimates the pose of a camera from a single image as the output of a fixed computation pipeline. These methods have many practical benefits …


Reliable And Interpretable Machine Learning For Modeling Physical And Cyber Systems, Daniel L. Marino Lizarazo Jan 2021

Reliable And Interpretable Machine Learning For Modeling Physical And Cyber Systems, Daniel L. Marino Lizarazo

Theses and Dissertations

Over the past decade, Machine Learning (ML) research has predominantly focused on building extremely complex models in order to improve predictive performance. The idea was that performance can be improved by adding complexity to the models. This approach proved to be successful in creating models that can approximate highly complex relationships while taking advantage of large datasets. However, this approach led to extremely complex black-box models that lack reliability and are difficult to interpret. By lack of reliability, we specifically refer to the lack of consistent (unpredictable) behavior in situations outside the training data. Lack of interpretability refers to the …


Binary Black Widow Optimization Algorithm For Feature Selection Problems, Ahmed Al-Saedi Jan 2021

Binary Black Widow Optimization Algorithm For Feature Selection Problems, Ahmed Al-Saedi

Theses and Dissertations (Comprehensive)

This thesis addresses feature selection (FS) problems, which is a primary stage in data mining. FS is a significant pre-processing stage to enhance the performance of the process with regards to computation cost and accuracy to offer a better comprehension of stored data by removing the unnecessary and irrelevant features from the basic dataset. However, because of the size of the problem, FS is known to be very challenging and has been classified as an NP-hard problem. Traditional methods can only be used to solve small problems. Therefore, metaheuristic algorithms (MAs) are becoming powerful methods for addressing the FS problems. …


Inference Of Surface Velocities From Oblique Time Lapse Photos And Terrestrial Based Lidar At The Helheim Glacier, Franklyn T. Dunbar Ii Jan 2021

Inference Of Surface Velocities From Oblique Time Lapse Photos And Terrestrial Based Lidar At The Helheim Glacier, Franklyn T. Dunbar Ii

Graduate Student Theses, Dissertations, & Professional Papers

Using time dependent observations derived from terrestrial LiDAR and oblique
time-lapse imagery, we demonstrate that a Bayesian approach to glacial motion es-
timation provides a concise way to incorporate multiple data products into a single
motion estimation procedure effectively producing surface velocity estimates with
an associated uncertainty. This approach brings both improved computational effi-
ciency, and greater scalability across observational time-frames when compared to
existing methods. To gauge efficacy, we apply these methods to a set of observa-
tions from the Helheim Glacier, a critical actor in contemporary mass loss trends
observed in the Greenland Ice Sheet. We find that …


Convolutional Audio Source Separation Applied To Drum Signal Separation, Marius Orehovschi Jan 2021

Convolutional Audio Source Separation Applied To Drum Signal Separation, Marius Orehovschi

Honors Theses

This study examined the task of drum signal separation from full music mixes via both classical methods (Independent Component Analysis) and a combination of Time-Frequency Binary Masking and Convolutional Neural Networks. The results indicate that classical methods relying on predefined computations do not achieve any meaningful results, while convolutional neural networks can achieve imperfect but musically useful results. Furthermore, neural network performance can be improved by data augmentation via transposition – a technique that can only be applied in the context of drum signal separation.