Open Access. Powered by Scholars. Published by Universities.®

Data Science Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 17 of 17

Full-Text Articles in Data Science

Automated Identification And Mapping Of Interesting Mineral Spectra In Crism Images, Arun M. Saranathan Mar 2024

Automated Identification And Mapping Of Interesting Mineral Spectra In Crism Images, Arun M. Saranathan

Doctoral Dissertations

The Compact Reconnaissance Imaging Spectrometer for Mars (CRISM) has proven to be an invaluable tool for the mineralogical analysis of the Martian surface. It has been crucial in identifying and mapping the spatial extents of various minerals. Primarily, the identification and mapping of these mineral spectral-shapes have been performed manually. Given the size of the CRISM image dataset, manual analysis of the full dataset would be arduous/infeasible. This dissertation attempts to address this issue by describing an (machine learning based) automated processing pipeline for CRISM data that can be used to identify and map the unique mineral signatures present in …


Statistical And Biological Analyses Of Acoustic Signals In Estrildid Finches, Moises Rivera Jun 2023

Statistical And Biological Analyses Of Acoustic Signals In Estrildid Finches, Moises Rivera

Dissertations, Theses, and Capstone Projects

Acoustic communication is a process that involves auditory perception and signal processing. Discrimination and recognition further require cognitive processes and supporting mechanisms in order to successfully identify and appropriately respond to signal senders. Although acoustic communication is common across birds, classical research has largely disregarded the perceptual abilities of perinatal altricial taxa. Chapter 1 reviews the literature of perinatal acoustic stimulation in birds, highlighting the disproportionate focus on precocial birds (e.g., chickens, ducks, quails). The long-held belief that altricial birds were incapable of acoustic perception in ovo was only recently overturned, as researchers began to find behavioral and physiological evidence …


Comparison Of Sampling Methods For Predicting Wine Quality Based On Physicochemical Properties, Robert Burigo, Scott Frazier, Eli Kravez, Nibhrat Lohia Apr 2023

Comparison Of Sampling Methods For Predicting Wine Quality Based On Physicochemical Properties, Robert Burigo, Scott Frazier, Eli Kravez, Nibhrat Lohia

SMU Data Science Review

Using the physicochemical properties of wine to predict quality has been done in numerous studies. Given the nature of these properties, the data is inherently skewed. Previous works have focused on handful of sampling techniques to balance the data. This research compares multiple sampling techniques in predicting the target with limited data. For this purpose, an ensemble model is used to evaluate the different techniques. There was no evidence found in this research to conclude that there are specific oversampling methods that improve random forest classifier for a multi-class problem.


The Basil Technique: Bias Adaptive Statistical Inference Learning Agents For Learning From Human Feedback, Jonathan Indigo Watson Jan 2023

The Basil Technique: Bias Adaptive Statistical Inference Learning Agents For Learning From Human Feedback, Jonathan Indigo Watson

Theses and Dissertations--Computer Science

We introduce a novel approach for learning behaviors using human-provided feedback that is subject to systematic bias. Our method, known as BASIL, models the feedback signal as a combination of a heuristic evaluation of an action's utility and a probabilistically-drawn bias value, characterized by unknown parameters. We present both the general framework for our technique and specific algorithms for biases drawn from a normal distribution. We evaluate our approach across various environments and tasks, comparing it to interactive and non-interactive machine learning methods, including deep learning techniques, using human trainers and a synthetic oracle with feedback distorted to varying degrees. …


Transfer Learning Using Infrared And Optical Full Motion Video Data For Gender Classification, Alexander M. Glandon, Joe Zalameda, Khan M. Iftekharuddin, Gabor F. Fulop (Ed.), David Z. Ting (Ed.), Lucy L. Zheng (Ed.) Jan 2023

Transfer Learning Using Infrared And Optical Full Motion Video Data For Gender Classification, Alexander M. Glandon, Joe Zalameda, Khan M. Iftekharuddin, Gabor F. Fulop (Ed.), David Z. Ting (Ed.), Lucy L. Zheng (Ed.)

Electrical & Computer Engineering Faculty Publications

This work is a review and extension of our ongoing research in human recognition analysis using multimodality motion sensor data. We review our work on hand crafted feature engineering for motion capture skeleton (MoCap) data, from the Air Force Research Lab for human gender followed by depth scan based skeleton extraction using LIDAR data from the Army Night Vision Lab for person identification. We then build on these works to demonstrate a transfer learning sensor fusion approach for using the larger MoCap and smaller LIDAR data for gender classification.


Applying Data Science And Machine Learning To Understand Health Care Transition For Adolescents And Emerging Adults With Special Health Care Needs, Lisamarie Turk Dec 2022

Applying Data Science And Machine Learning To Understand Health Care Transition For Adolescents And Emerging Adults With Special Health Care Needs, Lisamarie Turk

Nursing ETDs

A problem of classification places adolescents and emerging adults with special health care needs among the most at risk for poor or life-threatening health outcomes. This preliminary proof-of-concept study was conducted to determine if phenotypes of health care transition (HCT) for this vulnerable population could be established. Such phenotypes could support development of future studies that require data classifications as input. Mining of electronic health record data and cluster analysis were implemented to identify phenotypes. Subsequently, a machine learning concept model was developed for predicting acute care and medical condition severity. Three clusters were identified and described (Cluster 1, n …


Exploring The Effectiveness Of Multiple-Exemplar Training For Visual Analysis Of Ab-Design Graphs, Verena S. Bethke Jun 2022

Exploring The Effectiveness Of Multiple-Exemplar Training For Visual Analysis Of Ab-Design Graphs, Verena S. Bethke

Dissertations, Theses, and Capstone Projects

In behavior analysis, data are usually analyzed using visual analysis of the graphed data. There are a wide range of methods used to visually analyze data, from a basic ‘textbook’ style approach to the use of visual aids, decision-rubrics, and computer-based approaches. In the literature, there have been some comparisons of the efficacy of different approaches. Visual analysis as a behavior can be taught using a variety of methods, independent of how the skill itself is to be performed. Teaching methods include lecture, online instruction, and equivalence-based instruction. There is not much research on the teaching of visual analysis specifically, …


Toward Suicidal Ideation Detection With Lexical Network Features And Machine Learning, Ulya Bayram, William Lee, Daniel Santel, Ali Minai, Peggy Clark, Tracy Glauser, John Pestian Apr 2022

Toward Suicidal Ideation Detection With Lexical Network Features And Machine Learning, Ulya Bayram, William Lee, Daniel Santel, Ali Minai, Peggy Clark, Tracy Glauser, John Pestian

Northeast Journal of Complex Systems (NEJCS)

In this study, we introduce a new network feature for detecting suicidal ideation from clinical texts and conduct various additional experiments to enrich the state of knowledge. We evaluate statistical features with and without stopwords, use lexical networks for feature extraction and classification, and compare the results with standard machine learning methods using a logistic classifier, a neural network, and a deep learning method. We utilize three text collections. The first two contain transcriptions of interviews conducted by experts with suicidal (n=161 patients that experienced severe ideation) and control subjects (n=153). The third collection consists of interviews conducted by experts …


Exploring Cyberterrorism, Topic Models And Social Networks Of Jihadists Dark Web Forums: A Computational Social Science Approach, Vivian Fiona Guetler Jan 2022

Exploring Cyberterrorism, Topic Models And Social Networks Of Jihadists Dark Web Forums: A Computational Social Science Approach, Vivian Fiona Guetler

Graduate Theses, Dissertations, and Problem Reports

This three-article dissertation focuses on cyber-related topics on terrorist groups, specifically Jihadists’ use of technology, the application of natural language processing, and social networks in analyzing text data derived from terrorists' Dark Web forums. The first article explores cybercrime and cyberterrorism. As technology progresses, it facilitates new forms of behavior, including tech-related crimes known as cybercrime and cyberterrorism. In this article, I provide an analysis of the problems of cybercrime and cyberterrorism within the field of criminology by reviewing existing literature focusing on (a) the issues in defining terrorism, cybercrime, and cyberterrorism, (b) ways that cybercriminals commit a crime in …


Aspect-Based Sentiment Analysis Of Movie Reviews, Samuel Onalaja, Eric Romero, Bosang Yun Dec 2021

Aspect-Based Sentiment Analysis Of Movie Reviews, Samuel Onalaja, Eric Romero, Bosang Yun

SMU Data Science Review

This study investigates a comparison of classification models used to determine aspect based separated text sentiment and predict binary sentiments of movie reviews with genre and aspect specific driving factors. To gain a broader classification analysis, five machine and deep learning algorithms were compared: Logistic Regression (LR), Naive Bayes (NB), Support Vector Machine (SVM), and Recurrent Neural Network Long-Short-Term Memory (RNN LSTM). The various movie aspects that are utilized to separate the sentences are determined through aggregating aspect words from lexicon-base, supervised and unsupervised learning. The driving factors are randomly assigned to various movie aspects and their impact tied to …


Per-Pixel Cloud Cover Classification Of Multispectral Landsat-8 Data, Salome E. Carrasco [*], Torrey J. Wagner, Brent T. Langhals Jun 2021

Per-Pixel Cloud Cover Classification Of Multispectral Landsat-8 Data, Salome E. Carrasco [*], Torrey J. Wagner, Brent T. Langhals

Faculty Publications

Random forest and neural network algorithms are applied to identify cloud cover using 10 of the wavelength bands available in Landsat 8 imagery. The methods classify each pixel into 4 different classes: clear, cloud shadow, light cloud, or cloud. The first method is based on a fully connected neural network with ten input neurons, two hidden layers of 8 and 10 neurons respectively, and a single-neuron output for each class. This type of model is considered with and without L2 regularization applied to the kernel weighting. The final model type is a random forest classifier created from an ensemble of …


Using Machine Learning Methods To Predict The Movement Trajectories Of The Louisiana Black Bear, Daniel Clark, David Shaw, Armando Vela, Shane Weinstock, John Santerre, Joseph D. Clark May 2021

Using Machine Learning Methods To Predict The Movement Trajectories Of The Louisiana Black Bear, Daniel Clark, David Shaw, Armando Vela, Shane Weinstock, John Santerre, Joseph D. Clark

SMU Data Science Review

In 1992, the Louisiana black bear (Ursus americanus luteolus) was placed on the U.S. Endangered Species List. This was due to bear populations in Louisiana being small and isolated enough where their populations couldn’t intersect with other populations to grow. Interchange of individuals between subpopulations of bears in Louisiana is critical to maintain genetic diversity and avoid inbreeding effects. Utilizing GPS (Global Positioning System) data gathered from 31 radio-collared bears from 2010 through 2012, this research will investigate how bears traverse the landscape, which has implications for gene exchange. This paper will leverage machine learning tools to improve upon existing …


Human Fatigue Predictions In Complex Aviation Crew Operational Impact Conditions, Suresh Rangan May 2021

Human Fatigue Predictions In Complex Aviation Crew Operational Impact Conditions, Suresh Rangan

Doctoral Dissertations

In this last decade, several regulatory frameworks across the world in all modes of transportation had brought fatigue and its risk management in operations to the forefront. Of all transportation modes air travel has been the safest means of transportation. Still as part of continuous improvement efforts, regulators are insisting the operators to adopt strong fatigue science and its foundational principles to reinforce safety risk assessment and management. Fatigue risk management is a data driven system that finds a realistic balance between safety and productivity in an organization. This work discusses the effects of mathematical modeling of fatigue and its …


Assessing And Forecasting Chlorophyll Abundances In Minnesota Lake Using Remote Sensing And Statistical Approaches, Ben Von Korff Jan 2021

Assessing And Forecasting Chlorophyll Abundances In Minnesota Lake Using Remote Sensing And Statistical Approaches, Ben Von Korff

All Graduate Theses, Dissertations, and Other Capstone Projects

Harmful algae blooms (HABs) can negatively impact water quality, lake aesthetics, and can harm human and animal health. However, monitoring for HABs is rare in Minnesota. Detecting blooms which can vary spatially and may only be present briefly is challenging, so expanding monitoring in Minnesota would require the use of new and cost efficient technologies. Unmanned aerial vehicles (UAVs) were used for bloom mapping using RGB and near-infrared imagery. Real time monitoring was conducted in Bass Lake, in Faribault County, MN using trail cameras. Time series forecasting was conducted with high frequency chlorophyll-a data from a water quality sonde. Normalized …


Using Data Analytics To Predict Students Score, Nang Laik Ma, Gim Hong Chua Nov 2020

Using Data Analytics To Predict Students Score, Nang Laik Ma, Gim Hong Chua

Research Collection School Of Computing and Information Systems

Education is very important to Singapore, and the government has continued to invest heavily in our education system to become one of the world-class systems today. A strong foundation of Science, Technology, Engineering, and Mathematics (STEM) was what underpinned Singapore's development over the past 50 years. PISA is a triennial international survey that evaluates education systems worldwide by testing the skills and knowledge of 15-year-old students who are nearing the end of compulsory education. In this paper, the authors used the PISA data from 2012 and 2015 and developed machine learning techniques to predictive the students' scores and understand the …


Gaining Computational Insight Into Psychological Data: Applications Of Machine Learning With Eating Disorders And Autism Spectrum Disorder, Natalia Rosenfield Aug 2020

Gaining Computational Insight Into Psychological Data: Applications Of Machine Learning With Eating Disorders And Autism Spectrum Disorder, Natalia Rosenfield

Computational and Data Sciences (PhD) Dissertations

Over the past 100 years, assessment tools have been developed that allow us to explore mental and behavioral processes that could not be measured before. However, conventional statistical models used for psychological data are lacking in thoroughness and predictability. This provides a perfect opportunity to use machine learning to study the data in a novel way. In this paper, we present examples of using machine learning techniques with data in three areas: eating disorders, body satisfaction, and Autism Spectrum Disorder (ASD). We explore clustering algorithms as well as virtual reality (VR).

Our first study employs the k-means clustering algorithm to …


Disaster Damage Categorization Applying Satellite Images And Machine Learning Algorithm, Farinaz Sabz Ali Pour, Adrian Gheorghe Jan 2020

Disaster Damage Categorization Applying Satellite Images And Machine Learning Algorithm, Farinaz Sabz Ali Pour, Adrian Gheorghe

Engineering Management & Systems Engineering Faculty Publications

Special information has a significant role in disaster management. Land cover mapping can detect short- and long-term changes and monitor the vulnerable habitats. It is an effective evaluation to be included in the disaster management system to protect the conservation areas. The critical visual and statistical information presented to the decision-makers can help in mitigation or adaption before crossing a threshold. This paper aims to contribute in the academic and the practice aspects by offering a potential solution to enhance the disaster data source effectiveness. The key research question that the authors try to answer in this paper is how …