Open Access. Powered by Scholars. Published by Universities.®

Data Science Commons

Open Access. Powered by Scholars. Published by Universities.®

1,471 Full-Text Articles 2,939 Authors 273,342 Downloads 189 Institutions

All Articles in Data Science

Faceted Search

1,471 full-text articles. Page 5 of 73.

The Impact Of The Covid-19 Pandemic On Faculty Productivity And Gender Inequalities In Stem Disciplines, Monnie McGee, Raag Patel, Roslyn Smith, Satvik Ajmera 2023 Southern Methodist University

The Impact Of The Covid-19 Pandemic On Faculty Productivity And Gender Inequalities In Stem Disciplines, Monnie Mcgee, Raag Patel, Roslyn Smith, Satvik Ajmera

SMU Data Science Review

Women and minorities within STEM disciplines historically encounter obstacles in academic advancement, a situation compounded by the COVID-19 pandemic due to the imposition of additional responsibilities like caregiving. This study meticulously probes into the pandemic's influence on traditional academic productivity metrics – specifically publication and submission frequency, citation volume, and leadership in scholarly entities, by employing Natural Language Processing to extract and analyze data from key journals within various scientific domains. A critical revelation from the research indicates a notable downturn in publication activity during 2021, potentially attributed to pandemic-induced disruptions, with a compensatory surge observed in 2022. Although a …


Predicting Land Reclamation Of Bond Released Surface Mines, Kendall Scott, Austin Webb, Tadd Backus, Robert Slater 2023 Southern Methodist University

Predicting Land Reclamation Of Bond Released Surface Mines, Kendall Scott, Austin Webb, Tadd Backus, Robert Slater

SMU Data Science Review

Accurately measuring the recovery of released surface mines in the UnitedStates poses crucial challenges. This study aims to develop a prediction of land classification, that considers various environmental and coal mine variables. By utilizing this prediction, the researchers and environmentalists (specifically Appalachian Voices, the group heading this research) can better understand the relevant factors for successful reclamation. Efficient management of mine recovery is essential for environmental sustainability, regulatory compliance, and resource utilization. This study focuses on the Appalachian Forest area, which risks becoming a net carbon source (a place that emits more carbon than it absorbs) due to mine recovery. …


Utilizing Computer Vision For Automated Cellular Microscopy, Ahmed Awadallah, Ryan Bass, James Burke, Robert Price, John Santerre 2023 Southern Methodist University

Utilizing Computer Vision For Automated Cellular Microscopy, Ahmed Awadallah, Ryan Bass, James Burke, Robert Price, John Santerre

SMU Data Science Review

Abstract. Post-acquisition data analysis of microscopy images is a vital yet time-consuming process for researchers. Quantitative fields such as biology and microbiology often require using images as primary data sources. Finding methods to automate this process would increase the throughput, quality, and reproducibility. This research aims to provide a novel end-to-end pipeline that reduces the workload on researchers in identifying cell cytoplasm and nuclei while creating a process that can scale to the researcher's needs. The proposed methodology utilizes various image-processing techniques to rapidly identify the boundaries of cells and nuclei, including filtering, thresholding, and deep learning. The results …


Cm-Ii Meditation As An Intervention To Reduce Stress And Improve Attention: A Study Of Ml Detection, Spectral Analysis, And Hrv Metrics, Sreekanth Gopi 2023 Kennesaw State University

Cm-Ii Meditation As An Intervention To Reduce Stress And Improve Attention: A Study Of Ml Detection, Spectral Analysis, And Hrv Metrics, Sreekanth Gopi

Master of Science in Computer Science Theses

Students frequently face heightened stress due to academic and social pressures, particularly in de- manding fields like computer science and engineering. These challenges are often associated with serious mental health issues, including ADHD (Attention Deficit Hyperactivity Disorder), depression, and an increased risk of suicide. The average student attention span has notably decreased from 21⁄2 minutes to just 47 seconds, and now it typically takes about 25 minutes to switch attention to a new task (Mark, 2023). Research findings suggest that over 95% of individuals who die by suicide have been diagnosed with depression (Shahtahmasebi, 2013), and almost 20% of students …


Study Of Augmentations On Historical Manuscripts Using Trocr, Erez Meoded 2023 Mississippi State University

Study Of Augmentations On Historical Manuscripts Using Trocr, Erez Meoded

Theses and Dissertations

Historical manuscripts are an essential source of original content. For many reasons, it is hard to recognize these manuscripts as text. This thesis used a state-of-the-art Handwritten Text Recognizer, TrOCR, to recognize a 16th-century manuscript. TrOCR uses a vision transformer to encode the input images and a language transformer to decode them back to text. We showed that carefully preprocessed images and designed augmentations can improve the performance of TrOCR. We suggest an ensemble of augmented models to achieve an even better performance.


An Investigation Into Applications Of Canonical Polyadic Decomposition & Ensemble Learning In Forecasting Thermal Data Streams In Direct Laser Deposition Processes, Jonathan Storey 2023 Mississippi State University

An Investigation Into Applications Of Canonical Polyadic Decomposition & Ensemble Learning In Forecasting Thermal Data Streams In Direct Laser Deposition Processes, Jonathan Storey

Theses and Dissertations

Additive manufacturing (AM) is a process of creating objects from 3D model data by adding layers of material. AM technologies present several advantages compared to traditional manufacturing technologies, such as producing less material waste and being capable of producing parts with greater geometric complexity. However, deficiencies in the printing process due to high process uncertainty can affect the microstructural properties of a fabricated part leading to defects. In metal AM, previous studies have linked defects in parts with melt pool temperature fluctuations, with the size of the melt pool and the scan pattern being key factors associated with part defects. …


Unmasking Shadows: Unraveling Crime Patterns In Nyc's Boroughs, Jack Hachicho, Muhammad Hassan Butt 2023 CUNY New York City College of Technology

Unmasking Shadows: Unraveling Crime Patterns In Nyc's Boroughs, Jack Hachicho, Muhammad Hassan Butt

Publications and Research

New York City's crime dynamics have been on the rise for decades. Brooklyn and The Bronx have been disproportionately affected. This research aims to understand the crime landscape in these boroughs to formulate effective policies. Using crime data from official sources, statistical analyses, and data visualizations, the study identifies patterns and trends. The data encompasses over 400,000 reported incidents collected over the past 10 years, meticulously categorized by borough, crime type, and demographic information. Brooklyn has the highest overall crime rate, followed by The Bronx. Most shooting victims are Black. This highlights the need for holistic community programs to address …


Climate Change Impact On Bridge Scour Risk In Ny State: A Gis-Based Risk Analysis Model, Muhammad Hassan Butt 2023 CUNY New York City College of Technology

Climate Change Impact On Bridge Scour Risk In Ny State: A Gis-Based Risk Analysis Model, Muhammad Hassan Butt

Publications and Research

Bridge scour, the primary cause of bridge failure in the United States, escalates post-severe storms, necessitating effective mitigation. This study employs a GIS-based risk analysis model to assess climate change's impact on bridge scour and associated risks in New York State. Data from the National Bridge Inventory, climate hazard maps, and geospatial data are integrated.


Les Expositions Turnus, Une Page D’Histoire Transnationale Des Beaux-Arts En Suisse À La Fin Du Xixe Siècle. Et Comment Découvrir Les Humanités Numériques, Béatrice Joyeux-Prunel 2023 Université de Genève, Switzerland

Les Expositions Turnus, Une Page D’Histoire Transnationale Des Beaux-Arts En Suisse À La Fin Du Xixe Siècle. Et Comment Découvrir Les Humanités Numériques, Béatrice Joyeux-Prunel

Artl@s Bulletin

Cet article présente le travail de la classe d’introduction aux humanités numériques de l’Université de Genève sur les expositions Turnus en Suisse à partir des années 1840. Près de 50 catalogues ont été retranscrits, décrits et structurés à l’aide de scripts Python, puis géolocalisés. Les données ont été ajoutées à BasArt, le répertoire mondial de catalogues d’expositions d’Artl@s (https://artlas.huma-num.fr/map). Elles permettent de mieux comprendre les premières années de ces expositions et leurs dynamiques locales, fédérales et internationales. Le Turnus fut une plaque tournante pour les artistes suisses, voire un tremplin vers le marché européen de l’art.


High-Performance Computing In Covariant Loop Quantum Gravity, Pietropaolo Frisoni 2023 The University of Western Ontario

High-Performance Computing In Covariant Loop Quantum Gravity, Pietropaolo Frisoni

Electronic Thesis and Dissertation Repository

This Ph.D. thesis presents a compilation of the scientific papers I published over the last three years during my Ph.D. in loop quantum gravity (LQG). First, we comprehensively introduce spinfoam calculations with a practical pedagogical paper. We highlight LQG's unique features and mathematical formalism and emphasize the computational complexities associated with its calculations. The subsequent articles delve into specific aspects of employing high-performance computing (HPC) in LQG research. We discuss the results obtained by applying numerical methods to studying spinfoams' infrared divergences, or ``bubbles''. This research direction is crucial to define the continuum limit of LQG properly. We investigate the …


Development Of An App For The Kalamazoo Nature Center, Ernest Au 2023 Western Michigan University

Development Of An App For The Kalamazoo Nature Center, Ernest Au

Honors Theses

Kalamazoo Nature Center (KNC), which has been recognized by its peers as one of the top nature centers in the country, is home to over 14 miles of hiking trails winding through woods, wetlands, and prairies. There are numerous places/plots in KNC that have an interesting and impressive history besides being home to a variety of animals and hundreds of wildflowers and other plant life. To improve the visitor’s experience at KNC, we will design a software app via the senior capstone project at the department of Computer Science at WMU. As the first step towards establishing a reference model …


Teaching Reproducibility To First Year College Students: Reflections From An Introductory Data Science Course, Brennan L. Bean 2023 Utah State University

Teaching Reproducibility To First Year College Students: Reflections From An Introductory Data Science Course, Brennan L. Bean

Journal on Empowering Teaching Excellence

Access the online Pressbooks version of this article here.

Modern technology threatens traditional modes of classroom assessment by providing students with automated ways to write essays and take exams. At the same time, modern technology continues to expand the accessibility of computational tools that promise to increase the potential scope and quality of class projects. This paper presents a case study where students are asked to complete a “reproducible” final project in an introductory data science course using the R programming language. A reproducible project is one where an instructor can easily regenerate the results and conclusions from the submitted …


A Bridge Between Graph Neural Networks And Transformers: Positional Encodings As Node Embeddings, Bright Kwaku Manu 2023 East Tennessee State University

A Bridge Between Graph Neural Networks And Transformers: Positional Encodings As Node Embeddings, Bright Kwaku Manu

Electronic Theses and Dissertations

Graph Neural Networks and Transformers are very powerful frameworks for learning machine learning tasks. While they were evolved separately in diverse fields, current research has revealed some similarities and links between them. This work focuses on bridging the gap between GNNs and Transformers by offering a uniform framework that highlights their similarities and distinctions. We perform positional encodings and identify key properties that make the positional encodings node embeddings. We found that the properties of expressiveness, efficiency and interpretability were achieved in the process. We saw that it is possible to use positional encodings as node embeddings, which can be …


Convolution And Autoencoders Applied To Nonlinear Differential Equations, Noah Borquaye 2023 East Tennessee State University

Convolution And Autoencoders Applied To Nonlinear Differential Equations, Noah Borquaye

Electronic Theses and Dissertations

Autoencoders, a type of artificial neural network, have gained recognition by researchers in various fields, especially machine learning due to their vast applications in data representations from inputs. Recently researchers have explored the possibility to extend the application of autoencoders to solve nonlinear differential equations. Algorithms and methods employed in an autoencoder framework include sparse identification of nonlinear dynamics (SINDy), dynamic mode decomposition (DMD), Koopman operator theory and singular value decomposition (SVD). These approaches use matrix multiplication to represent linear transformation. However, machine learning algorithms often use convolution to represent linear transformations. In our work, we modify these approaches to …


Random Variable Spaces: Mathematical Properties And An Extension To Programming Computable Functions, Mohammed Kurd-Misto 2023 Chapman University

Random Variable Spaces: Mathematical Properties And An Extension To Programming Computable Functions, Mohammed Kurd-Misto

Computational and Data Sciences (PhD) Dissertations

This dissertation aims to extend the boundaries of Programming Computable Functions (PCF) by introducing a novel collection of categories referred to as Random Variable Spaces. Originating as a generalization of Quasi-Borel Spaces, Random Variable Spaces are rigorously defined as categories where objects are sets paired with a collection of random variables from an underlying measurable space. These spaces offer a theoretical foundation for extending PCF to natively handle stochastic elements.

The dissertation is structured into seven chapters that provide a multi-disciplinary background, from PCF and Measure Theory to Category Theory with special attention to Monads and the Giry Monad. The …


Generalized Differentiable Neural Architecture Search With Performance And Stability Improvements, Emily J. Herron 2023 University of Tennessee, Knoxville

Generalized Differentiable Neural Architecture Search With Performance And Stability Improvements, Emily J. Herron

Doctoral Dissertations

This work introduces improvements to the stability and generalizability of Cyclic DARTS (CDARTS). CDARTS is a Differentiable Architecture Search (DARTS)-based approach to neural architecture search (NAS) that uses a cyclic feedback mechanism to train search and evaluation networks concurrently, thereby optimizing the search process by enforcing that the networks produce similar outputs. However, the dissimilarity between the loss functions used by the evaluation networks during the search and retraining phases results in a search-phase evaluation network, a sub-optimal proxy for the final evaluation network utilized during retraining. ICDARTS, a revised algorithm that reformulates the search phase loss functions to ensure …


Parameter Estimation For Patient Enrollment In Clinical Trials, Junyan Liu 2023 William & Mary

Parameter Estimation For Patient Enrollment In Clinical Trials, Junyan Liu

Undergraduate Honors Theses

In this paper, we study the Poisson-gamma model for recruitment time in clinical trials. We proved several properties of this model that match our intuitions from a reliability perspective, did simulations on this model, and used different optimization methods to estimate the parameters. Although the behaviors of the optimization methods were unfavorable and unstable, we identified certain conditions and provided potential explanations for this phenomenon and further insights into the Poisson-gamma model.


Wavelet Compression As An Observational Operator In Data Assimilation Systems For Sea Surface Temperature, Bradley J. Sciacca 2023 University of New Orleans, New Orleans

Wavelet Compression As An Observational Operator In Data Assimilation Systems For Sea Surface Temperature, Bradley J. Sciacca

University of New Orleans Theses and Dissertations

The ocean remains severely under-observed, in part due to its sheer size. Containing nearly billion of water with most of the subsurface being invisible because water is extremely difficult to penetrate using electromagnetic radiation, as is typically used by satellite measuring instruments. For this reason, most observations of the ocean have very low spatial-temporal coverage to get a broad capture of the ocean’s features. However, recent “dense but patchy” data have increased the availability of high-resolution – low spatial coverage observations. These novel data sets have motivated research into multi-scale data assimilation methods. Here, we demonstrate a new assimilation approach …


Exploration And Statistical Modeling Of Profit, Caleb Gibson 2023 East Tennessee State University

Exploration And Statistical Modeling Of Profit, Caleb Gibson

Undergraduate Honors Theses

For any company involved in sales, maximization of profit is the driving force that guides all decision-making. Many factors can influence how profitable a company can be, including external factors like changes in inflation or consumer demand or internal factors like pricing and product cost. Understanding specific trends in one's own internal data, a company can readily identify problem areas or potential growth opportunities to help increase profitability.

In this discussion, we use an extensive data set to examine how a company might analyze their own data to identify potential changes the company might investigate to drive better performance. Based …


Exact Models, Heuristics, And Supervised Learning Approaches For Vehicle Routing Problems, Zefeng Lyu 2023 University of Tennessee, Knoxville

Exact Models, Heuristics, And Supervised Learning Approaches For Vehicle Routing Problems, Zefeng Lyu

Doctoral Dissertations

This dissertation presents contributions to the field of vehicle routing problems by utilizing exact methods, heuristic approaches, and the integration of machine learning with traditional algorithms. The research is organized into three main chapters, each dedicated to a specific routing problem and a unique methodology. The first chapter addresses the Pickup and Delivery Problem with Transshipments and Time Windows, a variant that permits product transfers between vehicles to enhance logistics flexibility and reduce costs. To solve this problem, we propose an efficient mixed-integer linear programming model that has been shown to outperform existing ones. The second chapter discusses a practical …


Digital Commons powered by bepress