Open Access. Powered by Scholars. Published by Universities.®

Data Science Commons

Open Access. Powered by Scholars. Published by Universities.®

1,461 Full-Text Articles 2,923 Authors 273,342 Downloads 188 Institutions

All Articles in Data Science

Faceted Search

1,461 full-text articles. Page 3 of 72.

Artificial Intelligence For The Electron Ion Collider (Ai4eic), C. Allaire, ..., Cristiano Fanelli, James Giroux, Joey Niestroy, Justin R. Stevens, Patrick Stone, L. Suarez, K. Suresh, Eric Walter, et al. 2024 William & Mary

Artificial Intelligence For The Electron Ion Collider (Ai4eic), C. Allaire, ..., Cristiano Fanelli, James Giroux, Joey Niestroy, Justin R. Stevens, Patrick Stone, L. Suarez, K. Suresh, Eric Walter, Et Al.

Arts & Sciences Articles

The Electron-Ion Collider (EIC), a state-of-the-art facility for studying the strong force, is expected to begin commissioning its first experiments in 2028. This is an opportune time for artificial intelligence (AI) to be included from the start at this facility and in all phases that lead up to the experiments. The second annual workshop organized by the AI4EIC working group, which recently took place, centered on exploring all current and prospective application areas of AI for the EIC. This workshop is not only beneficial for the EIC, but also provides valuable insights for the newly established ePIC collaboration at EIC. …


The Impact Of Accessible Data On Cyberstalking, Elise Kwan 2024 Purdue University

The Impact Of Accessible Data On Cyberstalking, Elise Kwan

The Journal of Purdue Undergraduate Research

No abstract provided.


Model Selection Through Cross-Validation For Supervised Learning Tasks With Manifold Data, Derek Brown 2024 Purdue University Fort Wayne

Model Selection Through Cross-Validation For Supervised Learning Tasks With Manifold Data, Derek Brown

The Journal of Purdue Undergraduate Research

No abstract provided.


Machine Learning Of Big Data: A Gaussian Regression Model To Predict The Spatiotemporal Distribution Of Ground Ozone, Jerry Gu 2024 Purdue University

Machine Learning Of Big Data: A Gaussian Regression Model To Predict The Spatiotemporal Distribution Of Ground Ozone, Jerry Gu

The Journal of Purdue Undergraduate Research

Tracking pollution levels on the ground is important to the environment and public health. One of the pollutants of concern is ozone, which, at high concentrations, can cause respiratory and cardiovascular problems. The National Center for Atmospheric Research (NCAR) has published valuable ozone data obtained from ground-based sensors installed at selected locations. Because it is unfeasible to measure the exact ozone levels everywhere at any time, it would be valuable to predict the temporal-spatial distributions of ozone concentration based on existing data. This would help us better understand the patterns and trends in the data and make better decisions to …


A Computational Profile Of Invasive Lionfish In Belize: A New Insight On A Destructive Species, Joshua E. Balan 2024 Purdue University

A Computational Profile Of Invasive Lionfish In Belize: A New Insight On A Destructive Species, Joshua E. Balan

The Journal of Purdue Undergraduate Research

Since their discovery in the region in 2009, invasive Indonesian-native lionfish have been taking over the Belize Barrier Reef. As a result, populations of local species have dwindled as they are either eaten or outcompeted by the invaders. This has led to devastating losses ecologically and economically; massive industries in the local nations, such as fisheries and tourism, have suffered greatly. Attempting to combat this, local organizations, from nonprofits to ecotourism companies, have been manually spear-hunting them on scuba dives to cull the population. One such company, Reef Conservation Institute (ReefCI), operating out of Tom Owens Caye outside of Placencia, …


Henderson Named One Of The Most Influential People In Legal Education, James Owsley Boyd 2024 Maurer School of Law: Indiana University

Henderson Named One Of The Most Influential People In Legal Education, James Owsley Boyd

Keep Up With the Latest News from the Law School (blog)

Indiana University Maurer School of Law Professor Bill Henderson has once again been recognized as one of the most influential people in legal education, but he’s not the only one with ties to the Law School on this year’s list.

The National Jurist ranked Henderson #18 on its list. Kellye Testy, a 1991 alumna of the Law School and president and CEO of the Law School Admission Council, is ranked second.


Molecular Understanding And Design Of Deep Eutectic Solvents And Proteins Using Computer Simulations And Machine Learning, Usman Lame Abbas 2024 University of Kentucky

Molecular Understanding And Design Of Deep Eutectic Solvents And Proteins Using Computer Simulations And Machine Learning, Usman Lame Abbas

Theses and Dissertations--Chemical and Materials Engineering

Hydrophobic deep eutectic solvents (DESs) have emerged as excellent extractants. A major challenge is the lack of an efficient tool to discover DES candidates. Currently, the search relies heavily on the researchers’ intuition or a trial-and-error process, which leads to a low success rate or bypassing of promising candidates. DES performance depends on the heterogeneous hydrogen bond environment formed by multiple hydrogen bond donors and acceptors. Understanding this heterogeneous hydrogen bond environment can help develop principles for designing high performance DESs for extraction and other separation applications. This work investigates the structure and dynamics of hydrogen bonds in hydrophobic DESs …


In Pursuit Of Consumption-Based Forecasting, Charles Chase, Kenneth B. Kahn 2024 SAS

In Pursuit Of Consumption-Based Forecasting, Charles Chase, Kenneth B. Kahn

Marketing Faculty Publications

[Introduction] Today's most mature, most sophisticated, best-in-class forecasting is what we call consumption-based forecasting (CBF). In contrast, the least sophisticated companies typically do not forecast at all, but rather set financial targets based on management expectations. Companies beginning to use statistical forecasting techniques usually take a supply-centric orientation, relying on time series techniques applied to shipment and/or order history. The next stage of progression is to incorporate promotions data, economic data, and market data alongside supply-centric data so that regression and other advanced analytics can be used. Companies pursing CBF utilize even more advanced capabilities to capture, examine, and understand …


Machine Learning Approaches For Cyberbullying Detection, Roland Fiagbe 2024 University of Central Florida

Machine Learning Approaches For Cyberbullying Detection, Roland Fiagbe

Data Science and Data Mining

Cyberbullying refers to the act of bullying using electronic means and the internet. In recent years, this act has been identifed to be a major problem among young people and even adults. It can negatively impact one’s emotions and lead to adverse outcomes like depression, anxiety, harassment, and suicide, among others. This has led to the need to employ machine learning techniques to automatically detect cyberbullying and prevent them on various social media platforms. In this study, we want to analyze the combination of some Natural Language Processing (NLP) algorithms (such as Bag-of-Words and TFIDF) with some popular machine learning …


Predicting Superconducting Critical Temperature Using Regression Analysis, Roland Fiagbe 2024 University of Central Florida

Predicting Superconducting Critical Temperature Using Regression Analysis, Roland Fiagbe

Data Science and Data Mining

This project estimates a regression model to predict the superconducting critical temperature based on variables extracted from the superconductor’s chemical formula. The regression model along with the stepwise variable selection gives a reasonable and good predictive model with a lower prediction error (MSE). Variables extracted based on atomic radius, valence, atomic mass and thermal conductivity appeared to have the most contribution to the predictive model.


A Bayesian Inversion For Emissions And Export Productivity Across The End-Cretaceous Boundary, Alexander A. Cox 2024 Dartmouth College

A Bayesian Inversion For Emissions And Export Productivity Across The End-Cretaceous Boundary, Alexander A. Cox

Dartmouth College Master’s Theses

The end-Cretaceous mass extinction was marked by both the Chicxulub impact and the ongoing emplacement of the Deccan Traps flood basalt province. Both of these events perturbed the environment by the emission of climate-active volatiles, primarily CO2 and SO2. To understand the mechanism of extinction, we must disentangle the timing, duration, and intensity of volcanic and meteoritic environmental forcings. In this thesis, we used a parallel Markov chain Monte Carlo approach to invert for the aforementioned volatile emissions, export productivity, and remineralization from 67 to 65 million years ago using the LOSCAR (Long-term Ocean-atmosphere-Sediment CArbon cycle Reservoir) model. The parallel …


Towards Algorithmic Justice: Human Centered Approaches To Artificial Intelligence Design To Support Fairness And Mitigate Bias In The Financial Services Sector, Jihyun Kim 2024 Claremont Colleges

Towards Algorithmic Justice: Human Centered Approaches To Artificial Intelligence Design To Support Fairness And Mitigate Bias In The Financial Services Sector, Jihyun Kim

CMC Senior Theses

Artificial Intelligence (AI) has positively transformed the Financial services sector but also introduced AI biases against protected groups, amplifying existing prejudices against marginalized communities. The financial decisions made by biased algorithms could cause life-changing ramifications in applications such as lending and credit scoring. Human Centered AI (HCAI) is an emerging concept where AI systems seek to augment, not replace human abilities while preserving human control to ensure transparency, equity and privacy. The evolving field of HCAI shares a common ground with and can be enhanced by the Human Centered Design principles in that they both put humans, the user, at …


Methods That Support The Validation Of Agent-Based Models: An Overview And Discussion, Andrew Collins, Matthew Koehler, Christopher Lynch 2024 Old Dominion University

Methods That Support The Validation Of Agent-Based Models: An Overview And Discussion, Andrew Collins, Matthew Koehler, Christopher Lynch

Engineering Management & Systems Engineering Faculty Publications

Validation is the process of determining if a model adequately represents the system under study for the model’s intended purpose. Validation is a critical component in building the credibility of a simulation model with its end-users. Effectively conducting validation can be a daunting task for both novice and experienced simulation developers. Further compounding the difficult task of conducting validation is that there is no universally accepted approach for assessing a simulation. These challenges are particularly relevant to the paradigm of Agent-Based Modeling and Simulation (ABMS) because of the complexity found in these models’ mechanisms and in the real-world situations they …


A Holistic Approach To Performance Prediction In Collegiate Athletics: Player, Team, And Conference Perspectives, Christopher Taber, S. Sharma, Mehul S. Raval, Samah Senbel, Allison Keefe, Jui Shah, Emma Patterson, Julie K. Nolan, N.S. Artan, Tolga Kaya 2024 Sacred Heart University

A Holistic Approach To Performance Prediction In Collegiate Athletics: Player, Team, And Conference Perspectives, Christopher Taber, S. Sharma, Mehul S. Raval, Samah Senbel, Allison Keefe, Jui Shah, Emma Patterson, Julie K. Nolan, N.S. Artan, Tolga Kaya

Exercise Science Faculty Publications

Predictive sports data analytics can be revolutionary for sports performance. Existing literature discusses players' or teams' performance, independently or in tandem. Using Machine Learning (ML), this paper aims to holistically evaluate player-, team-, and conference (season)-level performances in Division-1 Women's basketball. The players were monitored and tested through a full competitive year. The performance was quantified at the player level using the reactive strength index modified (RSImod), at the team level by the game score (GS) metric, and finally at the conference level through Player Efficiency Rating (PER). The data includes parameters from training, subjective stress, sleep, and recovery (WHOOP …


Combating Cyberbullying On Social Media: A Machine Learning Approach With Text Analysis On Twitter, Amir Alipour Yengejeh 2024 University of Central Florida

Combating Cyberbullying On Social Media: A Machine Learning Approach With Text Analysis On Twitter, Amir Alipour Yengejeh

Data Science and Data Mining

The popularity of the electronic mobile devices along with social media as well as networking websites have been tremendously increased in the recent year. Most people around the world daily engage in the variety of cyberspace additives. Even though the users can take most advantages of these system such as exchange the idea and information, being sociable, and enjoyments, they might be faced with such adverse behaviors such as toxicity, bullying, extremism, and cruelty. The recent statistics reports that such mentioned behaviors has been noticeably grown on the cyberspace such that can threaten the individuals and even any community. Thus, …


Advancing Cancer Classifcation Through Machine Learning Analysis Of Rna-Seq Gene Expression Data, Emil Agbemade, Amina Issoufou Anaroua, Dimitri Bamba 2024 University of Central Florida

Advancing Cancer Classifcation Through Machine Learning Analysis Of Rna-Seq Gene Expression Data, Emil Agbemade, Amina Issoufou Anaroua, Dimitri Bamba

Data Science and Data Mining

This study delves into the classifcation of various cancer types using the RNA-Seq (HiSeq) PANCAN dataset from the UCI Machine Learning Repository, which encompasses a rich collection of gene expression data across multiple tumor samples. To improve cancer diagnosis and treatment, our methodology confronts the challenges inherent in high-dimensional datasets, such as the Hughes Effect and the Curse of Dimensionality, through innovative feature selection methods and machine learning approaches. A key component of our strategy includes the use of tree-based algorithms, particularly Random Forest, to refine the dataset to seventy genes of utmost relevance for tumor classifcation, and the application …


Adaptive Multi-Label Classification On Drifting Data Streams, Martha Roseberry 2024 Virginia Commonwealth University

Adaptive Multi-Label Classification On Drifting Data Streams, Martha Roseberry

Theses and Dissertations

Drifting data streams and multi-label data are both challenging problems. When multi-label data arrives as a stream, the challenges of both problems must be addressed along with additional challenges unique to the combined problem. Algorithms must be fast and flexible, able to match both the speed and evolving nature of the stream. We propose four methods for learning from multi-label drifting data streams. First, a multi-label k Nearest Neighbors with Self Adjusting Memory (ML-SAM-kNN) exploits short- and long-term memories to predict the current and evolving states of the data stream. Second, a punitive k nearest neighbors algorithm with a self-adjusting …


Mhair: A Dataset Of Audio-Image Representations For Multimodal Human Actions, Muhammad Bilal Shaikh, Douglas Chai, Syed M. S. Islam, Naveed Akhtar 2024 Edith Cowan University

Mhair: A Dataset Of Audio-Image Representations For Multimodal Human Actions, Muhammad Bilal Shaikh, Douglas Chai, Syed M. S. Islam, Naveed Akhtar

Research outputs 2022 to 2026

Audio-image representations for a multimodal human action (MHAiR) dataset contains six different image representations of the audio signals that capture the temporal dynamics of the actions in a very compact and informative way. The dataset was extracted from the audio recordings which were captured from an existing video dataset, i.e., UCF101. Each data sample captured a duration of approximately 10 s long, and the overall dataset was split into 4893 training samples and 1944 testing samples. The resulting feature sequences were then converted into images, which can be used for human action recognition and other related tasks. These images can …


Xgboost Hyperberd Model Using Steam Platform, Yuh-Haur Chen 2024 University of Central Florida

Xgboost Hyperberd Model Using Steam Platform, Yuh-Haur Chen

Data Science and Data Mining

This project investigates game pricing strategies in the Steam market using an XGBoost model, drawing motivation from Professor Xie's lecture, and presenting findings through a density plot that delineates two primary pricing strategies. A free-to-play approach, indicated by a significant hot spot, is adopted by developers focusing on post-purchase revenues through DLC, aesthetic purchases, and in-game transactions. This sailing strategy includes community-centric developers aiming to distribute their games for player engagement rather than profit.

The project illustrates the effectiveness of advanced modeling techniques in handling complex datasets, with significant predictive accuracy reflected by a reduced MSE from 0.3472 to 0.1397. …


Learning Optimal Inter-Class Margin Adaptively For Few-Shot Class-Incremental Learning Via Neural Collapse-Based Meta-Learning, Hang Ran, Weijun Li, Lusi Li, Songsong Tian, Xin Ning, Prayag Tiwari 2024 Chinese Academy of Sciences

Learning Optimal Inter-Class Margin Adaptively For Few-Shot Class-Incremental Learning Via Neural Collapse-Based Meta-Learning, Hang Ran, Weijun Li, Lusi Li, Songsong Tian, Xin Ning, Prayag Tiwari

Computer Science Faculty Publications

Few-Shot Class-Incremental Learning (FSCIL) aims to learn new classes incrementally with a limited number of samples per class. It faces issues of forgetting previously learned classes and overfitting on few-shot classes. An efficient strategy is to learn features that are discriminative in both base and incremental sessions. Current methods improve discriminability by manually designing inter-class margins based on empirical observations, which can be suboptimal. The emerging Neural Collapse (NC) theory provides a theoretically optimal inter-class margin for classification, serving as a basis for adaptively computing the margin. Yet, it is designed for closed, balanced data, not for sequential or few-shot …


Digital Commons powered by bepress