Open Access. Powered by Scholars. Published by Universities.®

Data Science Commons

Open Access. Powered by Scholars. Published by Universities.®

1,514 Full-Text Articles 3,028 Authors 435,013 Downloads 190 Institutions

All Articles in Data Science

Faceted Search

1,514 full-text articles. Page 6 of 75.

Reducing Food Scarcity: The Benefits Of Urban Farming, S.A. Claudell, Emilio Mejia 2023 Brigham Young University

Reducing Food Scarcity: The Benefits Of Urban Farming, S.A. Claudell, Emilio Mejia

Journal of Nonprofit Innovation

Urban farming can enhance the lives of communities and help reduce food scarcity. This paper presents a conceptual prototype of an efficient urban farming community that can be scaled for a single apartment building or an entire community across all global geoeconomics regions, including densely populated cities and rural, developing towns and communities. When deployed in coordination with smart crop choices, local farm support, and efficient transportation then the result isn’t just sustainability, but also increasing fresh produce accessibility, optimizing nutritional value, eliminating the use of ‘forever chemicals’, reducing transportation costs, and fostering global environmental benefits.

Imagine Doris, who is …


Tiny Machine Learning For Underwater Image Enhancement: Pruning And Quantizaition Approach, Dr khaled nagaty, The British University in Egypt, Andreas Pester Dr 2023 The British University in Egypt

Tiny Machine Learning For Underwater Image Enhancement: Pruning And Quantizaition Approach, Dr Khaled Nagaty, The British University In Egypt, Andreas Pester Dr

Computer Science

Many people have expressed an interest in underwater image processing in a variety of fields, including underwater vehicle control, archaeology, marine biological studies, etc. Underwater exploration is becoming an increasingly important element of our lives, with applications ranging from underwater marine and creature research to pipeline and communication logistics, military use, touristic and entertainment use. Underwater images suffer from poor visibility, distortion, and poor quality for a variety of causes, including light propagation. The major issue arises when these images must be captured at depths greater than 500 feet and artificial lighting needs to be provided. Efficient algorithms and models …


Interpretable Word-Level Sentiment Analysis With Attention-Based Multiple Instance Classification Models, Chenyu Yang 2023 Southern Methodist University

Interpretable Word-Level Sentiment Analysis With Attention-Based Multiple Instance Classification Models, Chenyu Yang

Statistical Science Theses and Dissertations

In this study, our main objective is to tackle the black-box nature of popular machine learning models in sentiment analysis and enhance model interpretability. We aim to gain more insight into the decision-making process of sentiment analysis models, which is often obscure in those complex models. To achieve this goal, we introduce two word-level sentiment analysis models.

The first model is called the attention-based multiple instance classification (AMIC) model. It combines the transparent model structure of multiple instance classification and the self-attention mechanism in deep learning to incorporate the contextual information from documents. As demonstrated by a wine review dataset …


Roadside Lidar Data Processing For Intelligent Transportation System, Md Parvez Mollah 2023 University of New Mexico

Roadside Lidar Data Processing For Intelligent Transportation System, Md Parvez Mollah

Computer Science ETDs

Roadside LiDAR (Light Detection and Ranging) sensors are recently being explored for Intelligent Transportation System aiming at safer and faster traffic management and vehicular operations. However, massive data volume, occlusion, and limited viewing angles are significant obstacles to the widespread use of roadside LiDARs. In this dissertation, we address three major challenges to enable applications of Intelligent Transportation System through roadside LiDAR data: (i) real-time transmission of the massive point-cloud data from the roadside LiDAR devices to the cloud using 5G network, (ii) mitigating sensor occlusion problem to increase coverage and detect events occurred in occluded regions of a sensor, …


Utilizing Multitask Transfer Learning For Sonographic Rheumatoid Arthritis Synovitis Grading, Jordan Marie Claire Sanders 2023 Embry-Riddle Aeronautical University

Utilizing Multitask Transfer Learning For Sonographic Rheumatoid Arthritis Synovitis Grading, Jordan Marie Claire Sanders

Doctoral Dissertations and Master's Theses

Classifying the four sonographic Rheumatoid Arthritis (RA) synovitis grades (Grade 0, Grade 1, Grade 2, and Grade 3) is a difficult problem due to the complexity of the relevant markers. Therefore, the current research proposes a Multitask Transfer Learning (MTL) framework for sonographic RA synovitis grading of Ultrasound (US) images in Brightness mode (B-Mode) and Power Doppler mode.

In the medical community, the lack of reliability of scoring these images has been an issue and reason for concern for doctors and other medical practitioners. The human/machine variability across the acquisition procedure of these US images creates an additional challenge that …


Learning Mortality Risk For Covid-19 Using Machine Learning And Statistical Methods, Shaoshi Zhang 2023 Western University

Learning Mortality Risk For Covid-19 Using Machine Learning And Statistical Methods, Shaoshi Zhang

Electronic Thesis and Dissertation Repository

This research investigates the mortality risk of COVID-19 patients across different variant waves, using the data from Centers for Disease Control and Prevention (CDC) websites. By analyzing the available data, including patient medical records, vaccination rates, and hospital capacities, we aim to discern patterns and factors associated with COVID-19-related deaths.

To explore features linked to COVID-19 mortality, we employ different techniques such as Filter, Wrapper, and Embedded methods for feature selection. Furthermore, we apply various machine learning methods, including support vector machines, decision trees, random forests, logistic regression, K-nearest neighbours, na¨ıve Bayes methods, and artificial neural networks, to uncover underlying …


Ohio Recovery Housing: Resident Risk And Outcomes Assessment, Elyjiah Potter, Bivin Sadler 2023 Southern Methodist University

Ohio Recovery Housing: Resident Risk And Outcomes Assessment, Elyjiah Potter, Bivin Sadler

SMU Data Science Review

Addiction and substance abuse disorder is a significant problem in the United States. Over the past two decades, the United States has faced a boom in substance abuse, which has resulted in an increase in death and disruption of families across the nation. The State of Ohio has been particularly hard hit by the crisis, with overdose rates nearly doubling the national average. Established in the mid 1970’s Sober Living Housing is an alcohol and substance use recovery model emphasizing personal responsibility, sober living, and community support. This model has been adopted by the Ohio Recovery Housing organization, which seeks …


Deep Learning Image Analysis To Isolate And Characterize Different Stages Of S-Phase In Human Cells, Kevin A. Boyd, Rudranil Mitra, John Santerre, Christopher L. Sansam 2023 SMU

Deep Learning Image Analysis To Isolate And Characterize Different Stages Of S-Phase In Human Cells, Kevin A. Boyd, Rudranil Mitra, John Santerre, Christopher L. Sansam

SMU Data Science Review

Abstract. This research used deep learning for image analysis by isolating and characterizing distinct DNA replication patterns in human cells. By leveraging high-resolution microscopy images of multiple cells stained with 5-Ethynyl-2′-deoxyuridine (EdU), a replication marker, this analysis utilized Convolutional Neural Networks (CNNs) to perform image segmentation and to provide robust and reliable classification results. First multiple cells in a field of focus were identified using a pretrained CNN called Cellpose. After identifying the location of each cell in the image a python script was created to crop out each cell into individual .tif files. After careful annotation, a CNN was …


Investigation Into A Practical Application Of Reinforcement Learning For The Stock Market, Philip Traxler, Sadik Aman, Will Rogers, Allyn Okun 2023 Southern Methodist University

Investigation Into A Practical Application Of Reinforcement Learning For The Stock Market, Philip Traxler, Sadik Aman, Will Rogers, Allyn Okun

SMU Data Science Review

A major problem of the financial industry is the ability to adapt their trading strategies at the same rate the market evolves. This paper proposes a solution using existing Reinforcement Learning libraries to help find new strategies at a practical scale. Using a wide domain of ticker symbols, an algorithm is trained in an environment that better represents reality. The supplied decision-making algorithm is tested using recorded data from the U.S stock market from 2000 through 2022. The results of this research show that existing techniques are statistically better than making decisions at random. With this result, this research shows …


Differentiation Of Human, Dog, And Cat Hair Fibers Using Dart Tofms And Machine Learning, Laura Ahumada, Erin R. McClure-Price, Chad Kwong, Edgard O. Espinoza, John Santerre 2023 Southern Methodist University

Differentiation Of Human, Dog, And Cat Hair Fibers Using Dart Tofms And Machine Learning, Laura Ahumada, Erin R. Mcclure-Price, Chad Kwong, Edgard O. Espinoza, John Santerre

SMU Data Science Review

Hair is found in over 90% of crime scenes and has long been analyzed as trace evidence. However, recent reviews of traditional hair fiber analysis techniques, primarily morphological examination, have cast doubt on its reliability. To address these concerns, this study employed machine learning algorithms, specifically Linear Discriminant Analysis (LDA) and Random Forest, on Direct Analysis in Real Time time-of-flight mass spectra collected from human, cat, and dog hair samples. The objective was to develop a chemistry- and statistics-based classification method for unbiased taxonomic identification of hair. The results of the study showed that LDA and Random Forest were highly …


Impact Of Covid-19 On Recruitment Of High School Athletes To Di Track And Field, Christopher Haub, Jon Paugh, Alonso Salcido, Monnie McGee 2023 Southern Methodist University

Impact Of Covid-19 On Recruitment Of High School Athletes To Di Track And Field, Christopher Haub, Jon Paugh, Alonso Salcido, Monnie Mcgee

SMU Data Science Review

Due to COVID-19, in the spring of 2020, the NCAA gave scholarship athletes an extra year of eligibility but did not increase the number of scholarships a school could issue. This potentially led to increased competition for scholarships as coaches could choose between retaining athletes or recruiting new ones. Furthermore, the Spring 2020 track and field season for high school seniors ended early – limiting high school athletes’ chance to get their best scores, and interrupting student to college interaction. This research looks specifically at the impact of COVID-19, and the resulting NCAA policy changes, on the recruitment to DI …


A Prompt Engineering Approach To Creating Automated Commentary For Microsoft Self-Help Documentation Metric Reports Using Chatgpt, Ryan Herrin, Luke Stodgel, Brian Raffety 2023 Southern Methodist University

A Prompt Engineering Approach To Creating Automated Commentary For Microsoft Self-Help Documentation Metric Reports Using Chatgpt, Ryan Herrin, Luke Stodgel, Brian Raffety

SMU Data Science Review

Microsoft collects an immense amount of data from the users of their product-self-help documentation. Employees use this data to identify these self-help articles' performance trends and measure their impact on business Key Performance Indicators (KPIs). Microsoft uses various tools like Power BI and Python to analyze this data. The problem is that their analysis and findings are summarized manually. Therefore, this research will improve upon their current analysis methods by applying the latest prompt engineering practices and the power of ChatGPT's large language models (LLMs). Using VBA code, Microsoft Excel, and the ChatGPT API as an Excel add-in, this research …


The Impact Of The Covid-19 Pandemic On Faculty Productivity And Gender Inequalities In Stem Disciplines, Monnie McGee, Raag Patel, Roslyn Smith, Satvik Ajmera 2023 Southern Methodist University

The Impact Of The Covid-19 Pandemic On Faculty Productivity And Gender Inequalities In Stem Disciplines, Monnie Mcgee, Raag Patel, Roslyn Smith, Satvik Ajmera

SMU Data Science Review

Women and minorities within STEM disciplines historically encounter obstacles in academic advancement, a situation compounded by the COVID-19 pandemic due to the imposition of additional responsibilities like caregiving. This study meticulously probes into the pandemic's influence on traditional academic productivity metrics – specifically publication and submission frequency, citation volume, and leadership in scholarly entities, by employing Natural Language Processing to extract and analyze data from key journals within various scientific domains. A critical revelation from the research indicates a notable downturn in publication activity during 2021, potentially attributed to pandemic-induced disruptions, with a compensatory surge observed in 2022. Although a …


Predicting Land Reclamation Of Bond Released Surface Mines, Kendall Scott, Austin Webb, Tadd Backus, Robert Slater 2023 Southern Methodist University

Predicting Land Reclamation Of Bond Released Surface Mines, Kendall Scott, Austin Webb, Tadd Backus, Robert Slater

SMU Data Science Review

Accurately measuring the recovery of released surface mines in the UnitedStates poses crucial challenges. This study aims to develop a prediction of land classification, that considers various environmental and coal mine variables. By utilizing this prediction, the researchers and environmentalists (specifically Appalachian Voices, the group heading this research) can better understand the relevant factors for successful reclamation. Efficient management of mine recovery is essential for environmental sustainability, regulatory compliance, and resource utilization. This study focuses on the Appalachian Forest area, which risks becoming a net carbon source (a place that emits more carbon than it absorbs) due to mine recovery. …


Utilizing Computer Vision For Automated Cellular Microscopy, Ahmed Awadallah, Ryan Bass, James Burke, Robert Price, John Santerre 2023 Southern Methodist University

Utilizing Computer Vision For Automated Cellular Microscopy, Ahmed Awadallah, Ryan Bass, James Burke, Robert Price, John Santerre

SMU Data Science Review

Abstract. Post-acquisition data analysis of microscopy images is a vital yet time-consuming process for researchers. Quantitative fields such as biology and microbiology often require using images as primary data sources. Finding methods to automate this process would increase the throughput, quality, and reproducibility. This research aims to provide a novel end-to-end pipeline that reduces the workload on researchers in identifying cell cytoplasm and nuclei while creating a process that can scale to the researcher's needs. The proposed methodology utilizes various image-processing techniques to rapidly identify the boundaries of cells and nuclei, including filtering, thresholding, and deep learning. The results …


Cm-Ii Meditation As An Intervention To Reduce Stress And Improve Attention: A Study Of Ml Detection, Spectral Analysis, And Hrv Metrics, Sreekanth Gopi 2023 Kennesaw State University

Cm-Ii Meditation As An Intervention To Reduce Stress And Improve Attention: A Study Of Ml Detection, Spectral Analysis, And Hrv Metrics, Sreekanth Gopi

Master of Science in Computer Science Theses

Students frequently face heightened stress due to academic and social pressures, particularly in de- manding fields like computer science and engineering. These challenges are often associated with serious mental health issues, including ADHD (Attention Deficit Hyperactivity Disorder), depression, and an increased risk of suicide. The average student attention span has notably decreased from 21⁄2 minutes to just 47 seconds, and now it typically takes about 25 minutes to switch attention to a new task (Mark, 2023). Research findings suggest that over 95% of individuals who die by suicide have been diagnosed with depression (Shahtahmasebi, 2013), and almost 20% of students …


Study Of Augmentations On Historical Manuscripts Using Trocr, Erez Meoded 2023 Mississippi State University

Study Of Augmentations On Historical Manuscripts Using Trocr, Erez Meoded

Theses and Dissertations

Historical manuscripts are an essential source of original content. For many reasons, it is hard to recognize these manuscripts as text. This thesis used a state-of-the-art Handwritten Text Recognizer, TrOCR, to recognize a 16th-century manuscript. TrOCR uses a vision transformer to encode the input images and a language transformer to decode them back to text. We showed that carefully preprocessed images and designed augmentations can improve the performance of TrOCR. We suggest an ensemble of augmented models to achieve an even better performance.


An Investigation Into Applications Of Canonical Polyadic Decomposition & Ensemble Learning In Forecasting Thermal Data Streams In Direct Laser Deposition Processes, Jonathan Storey 2023 Mississippi State University

An Investigation Into Applications Of Canonical Polyadic Decomposition & Ensemble Learning In Forecasting Thermal Data Streams In Direct Laser Deposition Processes, Jonathan Storey

Theses and Dissertations

Additive manufacturing (AM) is a process of creating objects from 3D model data by adding layers of material. AM technologies present several advantages compared to traditional manufacturing technologies, such as producing less material waste and being capable of producing parts with greater geometric complexity. However, deficiencies in the printing process due to high process uncertainty can affect the microstructural properties of a fabricated part leading to defects. In metal AM, previous studies have linked defects in parts with melt pool temperature fluctuations, with the size of the melt pool and the scan pattern being key factors associated with part defects. …


Unmasking Shadows: Unraveling Crime Patterns In Nyc's Boroughs, Jack Hachicho, Muhammad Hassan Butt 2023 CUNY New York City College of Technology

Unmasking Shadows: Unraveling Crime Patterns In Nyc's Boroughs, Jack Hachicho, Muhammad Hassan Butt

Publications and Research

New York City's crime dynamics have been on the rise for decades. Brooklyn and The Bronx have been disproportionately affected. This research aims to understand the crime landscape in these boroughs to formulate effective policies. Using crime data from official sources, statistical analyses, and data visualizations, the study identifies patterns and trends. The data encompasses over 400,000 reported incidents collected over the past 10 years, meticulously categorized by borough, crime type, and demographic information. Brooklyn has the highest overall crime rate, followed by The Bronx. Most shooting victims are Black. This highlights the need for holistic community programs to address …


Climate Change Impact On Bridge Scour Risk In Ny State: A Gis-Based Risk Analysis Model, Muhammad Hassan Butt 2023 CUNY New York City College of Technology

Climate Change Impact On Bridge Scour Risk In Ny State: A Gis-Based Risk Analysis Model, Muhammad Hassan Butt

Publications and Research

Bridge scour, the primary cause of bridge failure in the United States, escalates post-severe storms, necessitating effective mitigation. This study employs a GIS-based risk analysis model to assess climate change's impact on bridge scour and associated risks in New York State. Data from the National Bridge Inventory, climate hazard maps, and geospatial data are integrated.


Digital Commons powered by bepress