Open Access. Powered by Scholars. Published by Universities.®

Data Science Commons

Open Access. Powered by Scholars. Published by Universities.®

1,509 Full-Text Articles 3,020 Authors 435,013 Downloads 190 Institutions

All Articles in Data Science

Faceted Search

1,509 full-text articles. Page 1 of 74.

Data Collector Selection Ranking-Based Method For Collaborative Multi-Tasks In Ubiquitous Environments, Belal Z. Hassan, Ahmed. A. A. Gad-Elrab, Mohamed S. Farag, S. E. Abu-Youssef 2024 Faculty of Science, Al-Azhar University Cairo, Egypt

Data Collector Selection Ranking-Based Method For Collaborative Multi-Tasks In Ubiquitous Environments, Belal Z. Hassan, Ahmed. A. A. Gad-Elrab, Mohamed S. Farag, S. E. Abu-Youssef

Al-Azhar Bulletin of Science

In Ubiquitous Computing and the Internet of Things, the sensing and control of objects involve numerous devices collecting and transmitting data. However, connecting these devices without fostering collaboration leads to suboptimal system performance. As the number of connected sensing devices in Internet of Things increases, efficient task accomplishment through collaboration becomes imperative. This paper proposes a Data Collector Selection Method for Collaborative Multi-Tasks to address this challenge, considering task preferences and uncertainty in data collectors' contributions. The proposed method incorporates three key aspects: (1) Using Fuzzy Analytical Hierarchy Process to determine optimal weights for task preferences; (2) Ranking data collectors …


Dual-Domain Clustering Of Spatiotemporal Infectious Disease Data, Samuel R. Thornton, Erin C.S. Acquesta, Patrick D. Finley, Mansoor A. Haider 2024 NC State University

Dual-Domain Clustering Of Spatiotemporal Infectious Disease Data, Samuel R. Thornton, Erin C.S. Acquesta, Patrick D. Finley, Mansoor A. Haider

Biology and Medicine Through Mathematics Conference

No abstract provided.


A Nlp Approach To Automating The Generation Of Surveys For Market Research, Anav Chug 2024 Georgia Southern University

A Nlp Approach To Automating The Generation Of Surveys For Market Research, Anav Chug

Honors College Theses

Market Research is vital but includes activities that are often laborious and time consuming. Survey questionnaires are one possible output of the process and market researchers spend a lot of time manually developing questions for focus groups. The proposed research aims to develop a software prototype that utilizes Natural Language Processing (NLP) to automate the process of generating survey questions for market research. The software uses a pre-trained Open AI language model to generate multiple choice survey questions based on a given product prompt, send it to a targeted email list, and also provides a real-time analysis of the responses …


Surmounting Challenges In Aggregating Results From Static Analysis Tools, Dr. Ann Marie Reinhold, Brittany Boles, A. Redempta Manzi Muneza, Thomas McElroy, Dr. Clemente Izurieta 2024 Montana State University

Surmounting Challenges In Aggregating Results From Static Analysis Tools, Dr. Ann Marie Reinhold, Brittany Boles, A. Redempta Manzi Muneza, Thomas Mcelroy, Dr. Clemente Izurieta

Military Cyber Affairs

Aggregation poses a significant challenge for software practitioners because it requires a comprehensive and nuanced understanding of raw data from diverse sources. Suites of static-analysis tools (SATs) are commonly used to assess organizational security but simultaneously introduce significant challenges. Challenges include unique results, scales, configuration environments for each SAT execution, and incompatible formats between SAT outputs. Here, we document our experiences addressing these issues. We highlight the problem of relying on a single vendor's SAT version and offer a solution for aggregating findings across multiple SATs, aiming to enhance software security practices and deter threats early with robust defensive operations.


Data Analysis Project For Preferred Credit Inc., Emily Smith, Greta Nesbit, Jack Simonet, Ignacio Sanchez-Romero 2024 College of Saint Benedict/Saint John's University

Data Analysis Project For Preferred Credit Inc., Emily Smith, Greta Nesbit, Jack Simonet, Ignacio Sanchez-Romero

Celebrating Scholarship and Creativity Day (2018-)

This project focuses on transforming real data within PCI's operations into valuable insights through an approach of coding, data cleaning, and visualization. By leveraging advanced techniques, the project aims to uncover key trends and create visually compelling representations to aid decision-making within the company. The outcome will allow PCI stakeholders the ability to extract valuable insights, optimize processes, and drive initiatives for growth and competitive advantage in the finance industry.


Computational Linguistics And Multilingualism: A Comparative Analysis With Spanish And English Data, Evelyn Lawrie 2024 Chapman University

Computational Linguistics And Multilingualism: A Comparative Analysis With Spanish And English Data, Evelyn Lawrie

Student Scholar Symposium Abstracts and Posters

Computational linguistics is an increasingly ubiquitous field, serving as the basis for artificial intelligence and machine translation. It aims to analyze the syntax and semantics of individual words and phrases. While there have been in-depth advancements in computational linguistics strategies for the English language, others have not been developed as thoroughly. This lack of emphasis on multilingualism has contributed to the disappearance of Hispanic perspectives in the digital world. Especially those of indigenous heritage, as the decline of many indigenous languages has been exacerbated by the lack of digital translation services. Sentiment analysis is a branch of computational linguistics that …


A Novel Correction For The Multivariate Ljung-Box Test, Minhao Huang 2024 Chapman University

A Novel Correction For The Multivariate Ljung-Box Test, Minhao Huang

Computational and Data Sciences (PhD) Dissertations

This research introduces an analytical improvement to the Multivariate Ljung-Box test that addresses significant deviations of the original test from the nominal Type I error rates under almost all scenarios. Prior attempts to mitigate this issue have been directed at modification of the test statistics or correction of the test distribution to achieve precise results in finite samples. In previous studies, focused on designing corrections to the univariate Ljung-Box, a method that specifically adjusts the test rejection region has been the most successful of attaining the best Type I error rates. We adopt the same approach for the more complex, …


Traffic Analysis Of Cities In San Bernardino County, Sai Kalyan Ayyagari 2024 California State University – San Bernardino

Traffic Analysis Of Cities In San Bernardino County, Sai Kalyan Ayyagari

Electronic Theses, Projects, and Dissertations

This research offers an in-depth analysis of vehicular traffic within San Bernardino County, California, aiming to spotlight congestion areas and suggest improvements for more efficient and sustainable transportation. Leveraging 2021 data from StreetLight Data, traffic patterns in 15 key cities were examined based on their population sizes, covering various vehicle types to dissect dynamics and flow. The methodology focused on analyzing trip purposes and metrics to calculate Vehicle Miles Traveled (VMT) and its influence on congestion and environmental factors.

Findings indicate considerable disparities in traffic volume, purposes, and timings across different urban areas, with population density and intercity connections significantly …


Truck Traffic Analysis In The Inland Empire, Bhavik Khatri 2024 California State University – San Bernardino

Truck Traffic Analysis In The Inland Empire, Bhavik Khatri

Electronic Theses, Projects, and Dissertations

This study undertakes a meticulous examination of truck traffic within the Inland Empire, focusing on the distribution and dynamics of medium and heavy-duty vehicles, to advocate for the region's transition to electric trucks. Utilizing advanced spatial analysis and data from Streetlight Data, it segments the region into six subregions, revealing distinct traffic patterns and environmental impacts. Notably, the research uncovers that the North Center and West zones, integral to the logistics and warehousing sectors, exhibit the highest traffic volumes, significantly influencing air quality and infrastructure.

Quantitative results from 2021 illustrate a pronounced disparity in truck activity: medium-weight vehicles accounted for …


A Framework That Explores The Cognitive Load Of Cs1 Assignments Using Pausing Behavior, Joshua O. Urry 2024 Utah State University

A Framework That Explores The Cognitive Load Of Cs1 Assignments Using Pausing Behavior, Joshua O. Urry

All Graduate Theses and Dissertations, Fall 2023 to Present

Pausing behavior in introductory Computer Science (CS1) courses has been related to a student’s performance in the course and could be linked to a student’s cognitive load, or assignment difficulty. Having an objective measure of the cognitive load would be beneficial to course instructors as it would help them design assignments that are not too difficult. Two studies are presented in this work. The first study uses Cognitive Load Theory and Vygotsky’s Zone of Proximal Development as a theoretical framework to analyze pause times between keystrokes to better understand what types of assignments need more educational support than others. The …


Evaluation Of An End-To-End Radiotherapy Treatment Planning Pipeline For Prostate Cancer, Mohammad Daniel El Basha, Court Laurence, Carlos Eduardo Cardenas, Julianne Pollard-Larkin, Steven Frank, David T. Fuentes, Falk Poenisch, Zhiqian H. Yu 2024 The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences

Evaluation Of An End-To-End Radiotherapy Treatment Planning Pipeline For Prostate Cancer, Mohammad Daniel El Basha, Court Laurence, Carlos Eduardo Cardenas, Julianne Pollard-Larkin, Steven Frank, David T. Fuentes, Falk Poenisch, Zhiqian H. Yu

Dissertations & Theses (Open Access)

Radiation treatment planning is a crucial and time-intensive process in radiation therapy. This planning involves carefully designing a treatment regimen tailored to a patient’s specific condition, including the type, location, and size of the tumor with reference to surrounding healthy tissues. For prostate cancer, this tumor may be either local, locally advanced with extracapsular involvement, or extend into the pelvic lymph node chain. Automating essential parts of this process would allow for the rapid development of effective treatment plans and better plan optimization to enhance tumor control for better outcomes.

The first objective of this work, to automate the treatment …


Interpreting Shift Encoders As State Space Models For Stationary Time Series, Patrick Donkoh 2024 East Tennessee State University

Interpreting Shift Encoders As State Space Models For Stationary Time Series, Patrick Donkoh

Electronic Theses and Dissertations

Time series analysis is a statistical technique used to analyze sequential data points collected or recorded over time. While traditional models such as autoregressive models and moving average models have performed sufficiently for time series analysis, the advent of artificial neural networks has provided models that have suggested improved performance. In this research, we provide a custom neural network; a shift encoder that can capture the intricate temporal patterns of time series data. We then compare the sparse matrix of the shift encoder to the parameters of the autoregressive model and observe the similarities. We further explore how we can …


Code For Care: Hypertension Prediction In Women Aged 18-39 Years, Kruti Sheth 2024 California State University, San Bernardino

Code For Care: Hypertension Prediction In Women Aged 18-39 Years, Kruti Sheth

Electronic Theses, Projects, and Dissertations

The longstanding prevalence of hypertension, often undiagnosed, poses significant risks of severe chronic and cardiovascular complications if left untreated. This study investigated the causes and underlying risks of hypertension in females aged between 18-39 years. The research questions were: (Q1.) What factors affect the occurrence of hypertension in females aged 18-39 years? (Q2.) What machine learning algorithms are suited for effectively predicting hypertension? (Q3.) How can SHAP values be leveraged to analyze the factors from model outputs? The findings are: (Q1.) Performing Feature selection using binary classification Logistic regression algorithm reveals an array of 30 most influential factors at an …


The Quantitative Analysis And Visualization Of Nfl Passing Routes, Sandeep Chitturi 2024 University of Arkansas, Fayetteville

The Quantitative Analysis And Visualization Of Nfl Passing Routes, Sandeep Chitturi

Computer Science and Computer Engineering Undergraduate Honors Theses

The strategic planning of offensive passing plays in the NFL incorporates numerous variables, including defensive coverages, player positioning, historical data, etc. This project develops an application using an analytical framework and an interactive model to simulate and visualize an NFL offense's passing strategy under varying conditions. Using R-programming and data management, the model dynamically represents potential passing routes in response to different defensive schemes. The system architecture integrates data from historical NFL league years to generate quantified route scores through designed mathematical equations. This allows for the prediction of potential passing routes for offensive skill players in response to the …


Sequential Optimization For Stressor-Informed Test Planning Through Integration Of Experimental And Simulated Data, Jacob Brecheisen 2024 University of Arkansas, Fayetteville

Sequential Optimization For Stressor-Informed Test Planning Through Integration Of Experimental And Simulated Data, Jacob Brecheisen

Data Science Undergraduate Honors Theses

This technical report details an innovative approach in reliability engineering aimed at maximizing system durability through a synergistic use of physical experimentation and computer-based modeling. Our methodology explores the efficient design and analysis of computer experiments and physical tests to facilitate accelerated reliability growth, while leveraging a sequential integration of data from these two distinct sources: costly physical experiments, characterized by random errors, and inexpensive computer simulations, marked by inherent systematic errors. The key innovation lies in the adoption of a closed-loop design and analysis method. This method begins by identifying a viable subset of important environmental stressors—such as temperature, …


Develop An Interactive Python Dashboard For Analyzing Ezproxy Logs, Andy Huff, Matthew Roth, Weiling Liu 2024 University of Louisville

Develop An Interactive Python Dashboard For Analyzing Ezproxy Logs, Andy Huff, Matthew Roth, Weiling Liu

Faculty Scholarship

This paper describes the development of an interactive dashboard in Python with EZproxy log data. Hopefully, this dashboard will help improve the evidence-based decision-making process in electronic resources management and explore the impact of library use.


Data Engineering: Building Software Efficiency In Medium To Large Organizations, Alessandro De La Torre 2024 Whittier College

Data Engineering: Building Software Efficiency In Medium To Large Organizations, Alessandro De La Torre

Whittier Scholars Program

The introduction of PoetHQ, a mobile application, offers an economical strategy for colleges, potentially ushering in significant cost savings. These savings could be redirected towards enhancing academic programs and services, enriching the educational landscape for students. PoetHQ aims to democratize access to crucial software, effectively removing financial barriers and facilitating a richer educational experience. By providing an efficient software solution that reduces organizational overhead while maximizing accessibility for students, the project highlights the essential role of equitable education and resource optimization within academic institutions.


Visualizing Nfl Player Metrics, Jayson Rhea 2024 Southern Adventist University

Visualizing Nfl Player Metrics, Jayson Rhea

Campus Research Day

This project is dedicated to reshaping the exploration of NFL player data. Tailored for sports analysts and fantasy football managers, the goal is to deliver convenience through seamless data navigation and precise filtering through an interactive dashboard. In contrast to the static formats found on the NFL website and ESPN, this dynamic interface offers interactive visualizations, empowering users to effortlessly compare data. These comparisons can be used draw quick conclusions about player performance.


Dashboard To Quickly Estimate The Cost And Duration Of An Nyc Green Taxi Trip, Isaac Braun 2024 Southern Adventist University

Dashboard To Quickly Estimate The Cost And Duration Of An Nyc Green Taxi Trip, Isaac Braun

Campus Research Day

Before hailing a New York City (NYC) taxi, residents and tourists do not easily know how much the trip will cost them or how long it may take. Taxis are still heavily used, even with the increase of ride-hailing services like Uber, and a new system has yet to be built to provide customers with these two metrics before taking a trip. This project aims to give riders a quick way to estimate a ride’s cost and duration through an interactive dashboard that allows filtering by pickup and drop-off neighborhoods. This is accomplished by analyzing three years of public data …


Binder, Tyler A. Peaster, Lindsey M. Davenport, Madelyn Little, Alex Bales 2024 Arkansas Tech University

Binder, Tyler A. Peaster, Lindsey M. Davenport, Madelyn Little, Alex Bales

ATU Research Symposium

Binder is a mobile application that aims to introduce readers to a book recommendation service that appeals to devoted and casual readers. The main goal of Binder is to enrich book selection and reading experience. This project was created in response to deficiencies in the mobile space for book suggestions, library management, and reading personalization. The tools we used to create the project include Visual Studio, .Net Maui Framework, C#, XAML, CSS, MongoDB, NoSQL, Git, GitHub, and Figma. The project’s selection of books were sourced from the Google Books repository. Binder aims to provide an intuitive interface that allows users …


Digital Commons powered by bepress