Open Access. Powered by Scholars. Published by Universities.®

Data Science Commons

Open Access. Powered by Scholars. Published by Universities.®

1,471 Full-Text Articles 2,939 Authors 273,342 Downloads 189 Institutions

All Articles in Data Science

Faceted Search

1,471 full-text articles. Page 1 of 73.

A Nlp Approach To Automating The Generation Of Surveys For Market Research, Anav Chug 2024 Georgia Southern University

A Nlp Approach To Automating The Generation Of Surveys For Market Research, Anav Chug

Honors College Theses

Market Research is vital but includes activities that are often laborious and time consuming. Survey questionnaires are one possible output of the process and market researchers spend a lot of time manually developing questions for focus groups. The proposed research aims to develop a software prototype that utilizes Natural Language Processing (NLP) to automate the process of generating survey questions for market research. The software uses a pre-trained Open AI language model to generate multiple choice survey questions based on a given product prompt, send it to a targeted email list, and also provides a real-time analysis of the responses …


A Framework That Explores The Cognitive Load Of Cs1 Assignments Using Pausing Behavior, Joshua O. Urry 2024 Utah State University

A Framework That Explores The Cognitive Load Of Cs1 Assignments Using Pausing Behavior, Joshua O. Urry

All Graduate Theses and Dissertations, Fall 2023 to Present

Pausing behavior in introductory Computer Science (CS1) courses has been related to a student’s performance in the course and could be linked to a student’s cognitive load, or assignment difficulty. Having an objective measure of the cognitive load would be beneficial to course instructors as it would help them design assignments that are not too difficult. Two studies are presented in this work. The first study uses Cognitive Load Theory and Vygotsky’s Zone of Proximal Development as a theoretical framework to analyze pause times between keystrokes to better understand what types of assignments need more educational support than others. The …


Data Engineering: Building Software Efficiency In Medium To Large Organizations, Alessandro De La Torre 2024 Whittier College

Data Engineering: Building Software Efficiency In Medium To Large Organizations, Alessandro De La Torre

Whittier Scholars Program

The introduction of PoetHQ, a mobile application, offers an economical strategy for colleges, potentially ushering in significant cost savings. These savings could be redirected towards enhancing academic programs and services, enriching the educational landscape for students. PoetHQ aims to democratize access to crucial software, effectively removing financial barriers and facilitating a richer educational experience. By providing an efficient software solution that reduces organizational overhead while maximizing accessibility for students, the project highlights the essential role of equitable education and resource optimization within academic institutions.


Visualizing Nfl Player Metrics, Jayson Rhea 2024 Southern Adventist University

Visualizing Nfl Player Metrics, Jayson Rhea

Campus Research Day

This project is dedicated to reshaping the exploration of NFL player data. Tailored for sports analysts and fantasy football managers, the goal is to deliver convenience through seamless data navigation and precise filtering through an interactive dashboard. In contrast to the static formats found on the NFL website and ESPN, this dynamic interface offers interactive visualizations, empowering users to effortlessly compare data. These comparisons can be used draw quick conclusions about player performance.


Techniques To Detect Fake Profiles On Social Media Using The New Age Algorithms – A Survey, A K M Rubaiyat Reza Habib, Edidiong Elijah Akpan 2024 Arkansas Tech University

Techniques To Detect Fake Profiles On Social Media Using The New Age Algorithms – A Survey, A K M Rubaiyat Reza Habib, Edidiong Elijah Akpan

ATU Research Symposium

This research explores the growing issue of fake accounts in Online Social Networks [OSNs]. While platforms like Twitter, Instagram, and Facebook foster connections, their lax authentication measures have attracted many scammers and cybercriminals. Fake profiles conduct malicious activities, such as phishing, spreading misinformation, and inciting social discord. The consequences range from cyberbullying to deceptive commercial practices. Detecting fake profiles manually is often challenging and causes considerable stress and trust issues for the users. Typically, a social media user scrutinizes various elements like the profile picture, bio, and shared posts to identify fake profiles. These evaluations sometimes lead users to conclude …


Accessing Advanced National Supercomputing And Storage Resources For Computational Research, Ramazan Aygun 2024 Kennesaw State University

Accessing Advanced National Supercomputing And Storage Resources For Computational Research, Ramazan Aygun

All Things Open

This presentation will cover ACCESS (Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support), and Kennesaw State University's involvement in Open Science Data Federation program as a data origin to help researchers and educators with or without supporting grants to utilize the nation’s advanced computing systems and services. ACCESS, a program established and funded by the National Science Foundation, is an ecosystem with capabilities for new modes of research and further democratizing participation. The presentation covers how to apply for allocations on ACCESS. The last part of the presentation will briefly explain Open Science Data Federation and Kennesaw State University's involvement as …


The Vulnerabilities Of Artificial Intelligence Models And Potential Defenses, Felix Iov 2024 William & Mary

The Vulnerabilities Of Artificial Intelligence Models And Potential Defenses, Felix Iov

Cybersecurity Undergraduate Research Showcase

The rapid integration of artificial intelligence (AI) into various commercial products has raised concerns about the security risks posed by adversarial attacks. These attacks manipulate input data to disrupt the functioning of AI models, potentially leading to severe consequences such as self-driving car crashes, financial losses, or data breaches. We will explore neural networks, their weaknesses, and potential defenses. We will discuss adversarial attacks including data poisoning, backdoor attacks, evasion attacks, and prompt injection. Then, we will explore defense strategies such as data protection, input sanitization, and adversarial training. By understanding how adversarial attacks work and the defenses against them, …


Urinalysis Test Data Analysis And Prediction, Nikhil Mhatre 2024 University of Texas at Arlington

Urinalysis Test Data Analysis And Prediction, Nikhil Mhatre

2024 Datathon Challenges

OUTLIERS Team submission to the Urinalysis Test Results Timed Challenge

Researched various algorithms like boosting and random forest. We learned a lot about their strength and weaknesses, and used these algorithms accordingly to solve the issues faced in the dataset.


Artificial Intelligence Could Probably Write This Essay Better Than Me, Claire Martino 2024 Augustana College, Rock Island Illinois

Artificial Intelligence Could Probably Write This Essay Better Than Me, Claire Martino

Augustana Center for the Study of Ethics Essay Contest

No abstract provided.


Localized Collocation Meshless Method For Modeling Transdermal Pharmacokinetics In Multiphase Skin Structures, Eduardo Divo 2024 Embry-Riddle Aeronautical University

Localized Collocation Meshless Method For Modeling Transdermal Pharmacokinetics In Multiphase Skin Structures, Eduardo Divo

Math Department Colloquium Series

The human skin has a complicated structure with many multi-scale, biophysical effects impacting the propagation of skin-injected substances, such as partitioning, metabolic reactions, adsorption and elimination. An extended version of Fick’s second law governing the process of the compound diffusion in various skin layer is employed in the current work by considering the conservation of mass of the substance and the metabolic reaction of the substance in viable skin. Additionally, a model assuming linear coupling between the substance concentrations that are bound and unbound with blood was developed. Using such a model, a set of coupled partial differential equations are …


Health And Healthcare: Designing For The Social Determinants Of Health And Blue Zones In North Nashville, Rebecca Tonguis, Honor Thomas, Olivia Hobbs 2024 Belmont University

Health And Healthcare: Designing For The Social Determinants Of Health And Blue Zones In North Nashville, Rebecca Tonguis, Honor Thomas, Olivia Hobbs

Belmont University Research Symposium (BURS)

Owned by North Nashville’s First Community Church, a now empty site in the Osage-North Fisk neighborhood of North Nashville has been identified as a potential site for a new location of The Store, in addition to a community-centric architectural development based on the social determinants of health and informed by the principles behind Blue Zones, the locations with the highest lifespans in the world. Opened by Brad Paisley and Kimberly Williams-Paisley, The Store is a free grocery store that “allow[s] people to shop for their basic needs in a way that protects dignity and fosters hope”, for which North Nashville …


Demographic Data Analysis For Measuring Economic Impact Of The Branch Of Nashville, Tessa Pendleton, Annie Wardroup, Nicole Speyrer, Kimberly Amaya Hernandez 2024 Belmont University

Demographic Data Analysis For Measuring Economic Impact Of The Branch Of Nashville, Tessa Pendleton, Annie Wardroup, Nicole Speyrer, Kimberly Amaya Hernandez

Belmont University Research Symposium (BURS)

As part of the Global Honors Scholars Collaborative, researchers aggregated data from The Belmont Data Collaborative to analyze the three primary ZIP codes (37211, 37013, 37217) served by The Branch of Nashville. These communities include immigrant and refugee populations, whom The Branch supports through its food bank, English classes, and further comprehensive care. Future program development will rely on the analysis of the current client base and eventual assessment of The Branch’s economic impact on the surrounding community. The goal of this research for The Branch of Nashville is twofold: (1) analyze the existing demographics within the above ZIP codes …


Combating Financial Crimes With Unsupervised Learning Techniques: Clustering And Dimensionality Reduction For Anti-Money Laundering, Ahmed N. Bakry, Almohammady S. Alsharkawy, Mohamed S. Farag, Kamal R. Raslan 2024 Faculty of Science Al-Azhar University Cairo, Egypt

Combating Financial Crimes With Unsupervised Learning Techniques: Clustering And Dimensionality Reduction For Anti-Money Laundering, Ahmed N. Bakry, Almohammady S. Alsharkawy, Mohamed S. Farag, Kamal R. Raslan

Al-Azhar Bulletin of Science

Anti-Money Laundering (AML) is a crucial task in ensuring the integrity of financial systems. One keychallenge in AML is identifying high-risk groups based on their behavior. Unsupervised learning, particularly clustering, is a promising solution for this task. However, the use of hundreds of features todescribe behavior results in a highdimensional dataset that negatively impacts clustering performance.In this paper, we investigate the effectiveness of combining clustering method agglomerative hierarchicalclustering with four dimensionality reduction techniques -Independent Component Analysis (ICA), andKernel Principal Component Analysis (KPCA), Singular Value Decomposition (SVD), Locality Preserving Projections (LPP)- to overcome the issue of high-dimensionality in AML data and …


Graph Neural Network Guided By Feature Selection And Centrality Measures For Node Classification On Homophilic And Heterophily Graphs, Asmaa M. Mahmoud, Heba F. Eid, Abeer S. Desuky, Hoda A. Ali 2024 Department of Mathematics, Faculty of Science, Al-Azhar University, Cairo, Egypt.

Graph Neural Network Guided By Feature Selection And Centrality Measures For Node Classification On Homophilic And Heterophily Graphs, Asmaa M. Mahmoud, Heba F. Eid, Abeer S. Desuky, Hoda A. Ali

Al-Azhar Bulletin of Science

One of the most recent developments in the fields of deep learning and machine learning is Graph Neural Networks (GNNs). GNNs core task is the feature aggregation stage, which is carried out over the node's neighbours without taking into account whether the features are relevant or not. Additionally, the majority of these existing node representation techniques only consider the network's topology structure while completely ignoring the centrality information. In this paper, a new technique for explaining graph features depending on four different feature selection approaches and centrality measures in order to identify the important nodes and relevant node features is …


Improving Educational Delivery And Content In Juvenile Detention Centers, Yomna Elmousalami 2024 Old Dominion University

Improving Educational Delivery And Content In Juvenile Detention Centers, Yomna Elmousalami

Undergraduate Research Symposium

Students in juvenile detention centers have the greatest need to receive improvements in educational delivery and content; however, they are one of the “truly disadvantaged” populations in terms of receiving those improvements. This work presents a qualitative data analysis based on a focus group meeting with stakeholders at a local Juvenile Detention Center. The current educational system in juvenile detention centers is based on paper worksheets, single-room style teaching methods, outdated technology, and a shortage of textbooks and teachers. In addition, detained students typically have behavioral challenges that are deemed "undesired" in society. As a result, many students miss classes …


Extracting Dnn Architectures Via Runtime Profiling On Mobile Gpus, Dong Hyub Kim 2024 University of Massachusetts Amherst

Extracting Dnn Architectures Via Runtime Profiling On Mobile Gpus, Dong Hyub Kim

Masters Theses

Due to significant investment, research, and development efforts over the past decade, deep neural networks (DNNs) have achieved notable advancements in classification and regression domains. As a result, DNNs are considered valuable intellectual property for artificial intelligence providers. Prior work has demonstrated highly effective model extraction attacks which steal a DNN, dismantling the provider’s business model and paving the way for unethical or malicious activities, such as misuse of personal data, safety risks in critical systems, or spreading misinformation. This thesis explores the feasibility of model extraction attacks on mobile devices using aggregated runtime profiles as a side-channel to leak …


Investigation Of Gas Dynamics In Water And Oil-Based Muds Using Das, Dts, And Dss Measurements, Temitayo S. Adeyemi 2024 Louisiana State University at Baton Rouge

Investigation Of Gas Dynamics In Water And Oil-Based Muds Using Das, Dts, And Dss Measurements, Temitayo S. Adeyemi

LSU Master's Theses

Reliable prediction of gas migration velocity, void fraction, and length of gas-affected region in water and oil-based muds is essential for effective planning, control, and optimization of drilling operations. However, there is a gap in our understanding of gas behavior and dynamics in water and oil-based muds. This is a consequence of the use of experimental systems that are not representative of field-scale conditions. This study seeks to bridge the gap via the well-scale deployment of distributed fiber-optic sensors for real-time monitoring of gas behavior and dynamics in water and oil-based mud. The aforementioned parameters were estimated in real-time using …


Automated Identification And Mapping Of Interesting Mineral Spectra In Crism Images, Arun M. Saranathan 2024 University of Massachusetts Amherst

Automated Identification And Mapping Of Interesting Mineral Spectra In Crism Images, Arun M. Saranathan

Doctoral Dissertations

The Compact Reconnaissance Imaging Spectrometer for Mars (CRISM) has proven to be an invaluable tool for the mineralogical analysis of the Martian surface. It has been crucial in identifying and mapping the spatial extents of various minerals. Primarily, the identification and mapping of these mineral spectral-shapes have been performed manually. Given the size of the CRISM image dataset, manual analysis of the full dataset would be arduous/infeasible. This dissertation attempts to address this issue by describing an (machine learning based) automated processing pipeline for CRISM data that can be used to identify and map the unique mineral signatures present in …


Data To Science With Ai And Human-In-The-Loop, Gustavo Perez Sarabia 2024 University of Massachusetts Amherst

Data To Science With Ai And Human-In-The-Loop, Gustavo Perez Sarabia

Doctoral Dissertations

AI has the potential to accelerate scientific discovery by enabling scientists to analyze vast datasets more efficiently than traditional methods. For example, this thesis considers the detection of star clusters in high-resolution images of galaxies taken from space telescopes, as well as studying bird migration from RADAR images. In these applications, the goal is to make measurements to answer scientific questions, such as how the star formation rate is affected by mass, or how the phenology of bird migration is influenced by climate change. However, current computer vision systems are far from perfect for conducting these measurements directly. They may …


Historical Perspectives In Volatility Forecasting Methods With Machine Learning, Zhiang Qiu, Clemens Kownatzki, Fabien Scalzo, Eun Sang Cha 2024 Pepperdine University

Historical Perspectives In Volatility Forecasting Methods With Machine Learning, Zhiang Qiu, Clemens Kownatzki, Fabien Scalzo, Eun Sang Cha

Seaver College Research And Scholarly Achievement Symposium

Volatility forecasting in the financial market plays a pivotal role across a spectrum of disciplines, such as risk management, option pricing, and market making. However, volatility forecasting is challenging because volatility can only be estimated, and different factors influence volatility, ranging from macroeconomic indicators to investor sentiments. While recent works suggest advances in machine learning and artificial intelligence for volatility forecasting, a comprehensive benchmark of current statistical and learning-based methods for such purposes is lacking. Thus, this paper aims to provide a comprehensive survey of the historical evolution of volatility forecasting with a comparative benchmark of key landmark models. We …


Digital Commons powered by bepress