Open Access. Powered by Scholars. Published by Universities.®

Data Science Commons

Open Access. Powered by Scholars. Published by Universities.®

Series

Discipline
Institution
Keyword
Publication Year
Publication
File Type

Articles 1 - 30 of 533

Full-Text Articles in Data Science

Data Analysis Project For Preferred Credit Inc., Emily Smith, Greta Nesbit, Jack Simonet, Ignacio Sanchez-Romero May 2024

Data Analysis Project For Preferred Credit Inc., Emily Smith, Greta Nesbit, Jack Simonet, Ignacio Sanchez-Romero

Celebrating Scholarship and Creativity Day (2018-)

This project focuses on transforming real data within PCI's operations into valuable insights through an approach of coding, data cleaning, and visualization. By leveraging advanced techniques, the project aims to uncover key trends and create visually compelling representations to aid decision-making within the company. The outcome will allow PCI stakeholders the ability to extract valuable insights, optimize processes, and drive initiatives for growth and competitive advantage in the finance industry.


Computational Linguistics And Multilingualism: A Comparative Analysis With Spanish And English Data, Evelyn Lawrie May 2024

Computational Linguistics And Multilingualism: A Comparative Analysis With Spanish And English Data, Evelyn Lawrie

Student Scholar Symposium Abstracts and Posters

Computational linguistics is an increasingly ubiquitous field, serving as the basis for artificial intelligence and machine translation. It aims to analyze the syntax and semantics of individual words and phrases. While there have been in-depth advancements in computational linguistics strategies for the English language, others have not been developed as thoroughly. This lack of emphasis on multilingualism has contributed to the disappearance of Hispanic perspectives in the digital world. Especially those of indigenous heritage, as the decline of many indigenous languages has been exacerbated by the lack of digital translation services. Sentiment analysis is a branch of computational linguistics that …


Develop An Interactive Python Dashboard For Analyzing Ezproxy Logs, Andy Huff, Matthew Roth, Weiling Liu Apr 2024

Develop An Interactive Python Dashboard For Analyzing Ezproxy Logs, Andy Huff, Matthew Roth, Weiling Liu

Faculty Scholarship

This paper describes the development of an interactive dashboard in Python with EZproxy log data. Hopefully, this dashboard will help improve the evidence-based decision-making process in electronic resources management and explore the impact of library use.


Data Engineering: Building Software Efficiency In Medium To Large Organizations, Alessandro De La Torre Apr 2024

Data Engineering: Building Software Efficiency In Medium To Large Organizations, Alessandro De La Torre

Whittier Scholars Program

The introduction of PoetHQ, a mobile application, offers an economical strategy for colleges, potentially ushering in significant cost savings. These savings could be redirected towards enhancing academic programs and services, enriching the educational landscape for students. PoetHQ aims to democratize access to crucial software, effectively removing financial barriers and facilitating a richer educational experience. By providing an efficient software solution that reduces organizational overhead while maximizing accessibility for students, the project highlights the essential role of equitable education and resource optimization within academic institutions.


Urinalysis Test Data Analysis And Prediction, Nikhil Mhatre Apr 2024

Urinalysis Test Data Analysis And Prediction, Nikhil Mhatre

2024 Datathon Challenges

OUTLIERS Team submission to the Urinalysis Test Results Timed Challenge

Researched various algorithms like boosting and random forest. We learned a lot about their strength and weaknesses, and used these algorithms accordingly to solve the issues faced in the dataset.


Artificial Intelligence Could Probably Write This Essay Better Than Me, Claire Martino Apr 2024

Artificial Intelligence Could Probably Write This Essay Better Than Me, Claire Martino

Augustana Center for the Study of Ethics Essay Contest

No abstract provided.


Transcriptional Dynamics During Rhodococcus Erythropolis Infection With Phage Wc1, Dana Willner, Sudip Paudel, Andrew D. Halleran, Grace E. Solini, Veronica Gray, Margaret Saha Apr 2024

Transcriptional Dynamics During Rhodococcus Erythropolis Infection With Phage Wc1, Dana Willner, Sudip Paudel, Andrew D. Halleran, Grace E. Solini, Veronica Gray, Margaret Saha

Arts & Sciences Articles

Background

Belonging to the Actinobacteria phylum, members of the Rhodococcus genus thrive in soil, water, and even intracellularly. While most species are non-pathogenic, several cause respiratory disease in animals and, more rarely, in humans. Over 100 phages that infect Rhodococcus species have been isolated but despite their importance for Rhodococcus ecology and biotechnology applications, little is known regarding the molecular genetic interactions between phage and host during infection. To address this need, we report RNA-Seq analysis of a novel Rhodococcus erythopolis phage, WC1, analyzing both the phage and host transcriptome at various stages throughout the infection process.

Results

By five …


Health And Healthcare: Designing For The Social Determinants Of Health And Blue Zones In North Nashville, Rebecca Tonguis, Honor Thomas, Olivia Hobbs Apr 2024

Health And Healthcare: Designing For The Social Determinants Of Health And Blue Zones In North Nashville, Rebecca Tonguis, Honor Thomas, Olivia Hobbs

Belmont University Research Symposium (BURS)

Owned by North Nashville’s First Community Church, a now empty site in the Osage-North Fisk neighborhood of North Nashville has been identified as a potential site for a new location of The Store, in addition to a community-centric architectural development based on the social determinants of health and informed by the principles behind Blue Zones, the locations with the highest lifespans in the world. Opened by Brad Paisley and Kimberly Williams-Paisley, The Store is a free grocery store that “allow[s] people to shop for their basic needs in a way that protects dignity and fosters hope”, for which North Nashville …


Demographic Data Analysis For Measuring Economic Impact Of The Branch Of Nashville, Tessa Pendleton, Annie Wardroup, Nicole Speyrer, Kimberly Amaya Hernandez Apr 2024

Demographic Data Analysis For Measuring Economic Impact Of The Branch Of Nashville, Tessa Pendleton, Annie Wardroup, Nicole Speyrer, Kimberly Amaya Hernandez

Belmont University Research Symposium (BURS)

As part of the Global Honors Scholars Collaborative, researchers aggregated data from The Belmont Data Collaborative to analyze the three primary ZIP codes (37211, 37013, 37217) served by The Branch of Nashville. These communities include immigrant and refugee populations, whom The Branch supports through its food bank, English classes, and further comprehensive care. Future program development will rely on the analysis of the current client base and eventual assessment of The Branch’s economic impact on the surrounding community. The goal of this research for The Branch of Nashville is twofold: (1) analyze the existing demographics within the above ZIP codes …


Anomaly Detection On Small Wind Turbine Blades Using Deep Learning Algorithms, Bridger Altice, Edwin Nazario, Mason Davis, Mohammad Shekaramiz, Todd K. Moon, Mohammad A. S. Masoum Feb 2024

Anomaly Detection On Small Wind Turbine Blades Using Deep Learning Algorithms, Bridger Altice, Edwin Nazario, Mason Davis, Mohammad Shekaramiz, Todd K. Moon, Mohammad A. S. Masoum

Electrical and Computer Engineering Faculty Publications

Wind turbine blade maintenance is expensive, dangerous, time-consuming, and prone to misdiagnosis. A potential solution to aid preventative maintenance is using deep learning and drones for inspection and early fault detection. In this research, five base deep learning architectures are investigated for anomaly detection on wind turbine blades, including Xception, Resnet-50, AlexNet, and VGG-19, along with a custom convolutional neural network. For further analysis, transfer learning approaches were also proposed and developed, utilizing these architectures as the feature extraction layers. In order to investigate model performance, a new dataset containing 6000 RGB images was created, making use of indoor and …


Artificial Intelligence For The Electron Ion Collider (Ai4eic), C. Allaire, ..., Cristiano Fanelli, James Giroux, Joey Niestroy, Justin R. Stevens, Patrick Stone, L. Suarez, K. Suresh, Eric Walter, Et Al. Feb 2024

Artificial Intelligence For The Electron Ion Collider (Ai4eic), C. Allaire, ..., Cristiano Fanelli, James Giroux, Joey Niestroy, Justin R. Stevens, Patrick Stone, L. Suarez, K. Suresh, Eric Walter, Et Al.

Arts & Sciences Articles

The Electron-Ion Collider (EIC), a state-of-the-art facility for studying the strong force, is expected to begin commissioning its first experiments in 2028. This is an opportune time for artificial intelligence (AI) to be included from the start at this facility and in all phases that lead up to the experiments. The second annual workshop organized by the AI4EIC working group, which recently took place, centered on exploring all current and prospective application areas of AI for the EIC. This workshop is not only beneficial for the EIC, but also provides valuable insights for the newly established ePIC collaboration at EIC. …


Henderson Named One Of The Most Influential People In Legal Education, James Owsley Boyd Jan 2024

Henderson Named One Of The Most Influential People In Legal Education, James Owsley Boyd

Keep Up With the Latest News from the Law School (blog)

Indiana University Maurer School of Law Professor Bill Henderson has once again been recognized as one of the most influential people in legal education, but he’s not the only one with ties to the Law School on this year’s list.

The National Jurist ranked Henderson #18 on its list. Kellye Testy, a 1991 alumna of the Law School and president and CEO of the Law School Admission Council, is ranked second.


In Pursuit Of Consumption-Based Forecasting, Charles Chase, Kenneth B. Kahn Jan 2024

In Pursuit Of Consumption-Based Forecasting, Charles Chase, Kenneth B. Kahn

Marketing Faculty Publications

[Introduction] Today's most mature, most sophisticated, best-in-class forecasting is what we call consumption-based forecasting (CBF). In contrast, the least sophisticated companies typically do not forecast at all, but rather set financial targets based on management expectations. Companies beginning to use statistical forecasting techniques usually take a supply-centric orientation, relying on time series techniques applied to shipment and/or order history. The next stage of progression is to incorporate promotions data, economic data, and market data alongside supply-centric data so that regression and other advanced analytics can be used. Companies pursing CBF utilize even more advanced capabilities to capture, examine, and understand …


Machine Learning Approaches For Cyberbullying Detection, Roland Fiagbe Jan 2024

Machine Learning Approaches For Cyberbullying Detection, Roland Fiagbe

Data Science and Data Mining

Cyberbullying refers to the act of bullying using electronic means and the internet. In recent years, this act has been identifed to be a major problem among young people and even adults. It can negatively impact one’s emotions and lead to adverse outcomes like depression, anxiety, harassment, and suicide, among others. This has led to the need to employ machine learning techniques to automatically detect cyberbullying and prevent them on various social media platforms. In this study, we want to analyze the combination of some Natural Language Processing (NLP) algorithms (such as Bag-of-Words and TFIDF) with some popular machine learning …


Predicting Superconducting Critical Temperature Using Regression Analysis, Roland Fiagbe Jan 2024

Predicting Superconducting Critical Temperature Using Regression Analysis, Roland Fiagbe

Data Science and Data Mining

This project estimates a regression model to predict the superconducting critical temperature based on variables extracted from the superconductor’s chemical formula. The regression model along with the stepwise variable selection gives a reasonable and good predictive model with a lower prediction error (MSE). Variables extracted based on atomic radius, valence, atomic mass and thermal conductivity appeared to have the most contribution to the predictive model.


A Holistic Approach To Performance Prediction In Collegiate Athletics: Player, Team, And Conference Perspectives, Christopher Taber, S. Sharma, Mehul S. Raval, Samah Senbel, Allison Keefe, Jui Shah, Emma Patterson, Julie K. Nolan, N.S. Artan, Tolga Kaya Jan 2024

A Holistic Approach To Performance Prediction In Collegiate Athletics: Player, Team, And Conference Perspectives, Christopher Taber, S. Sharma, Mehul S. Raval, Samah Senbel, Allison Keefe, Jui Shah, Emma Patterson, Julie K. Nolan, N.S. Artan, Tolga Kaya

Exercise Science Faculty Publications

Predictive sports data analytics can be revolutionary for sports performance. Existing literature discusses players' or teams' performance, independently or in tandem. Using Machine Learning (ML), this paper aims to holistically evaluate player-, team-, and conference (season)-level performances in Division-1 Women's basketball. The players were monitored and tested through a full competitive year. The performance was quantified at the player level using the reactive strength index modified (RSImod), at the team level by the game score (GS) metric, and finally at the conference level through Player Efficiency Rating (PER). The data includes parameters from training, subjective stress, sleep, and recovery (WHOOP …


Combating Cyberbullying On Social Media: A Machine Learning Approach With Text Analysis On Twitter, Amir Alipour Yengejeh Jan 2024

Combating Cyberbullying On Social Media: A Machine Learning Approach With Text Analysis On Twitter, Amir Alipour Yengejeh

Data Science and Data Mining

The popularity of the electronic mobile devices along with social media as well as networking websites have been tremendously increased in the recent year. Most people around the world daily engage in the variety of cyberspace additives. Even though the users can take most advantages of these system such as exchange the idea and information, being sociable, and enjoyments, they might be faced with such adverse behaviors such as toxicity, bullying, extremism, and cruelty. The recent statistics reports that such mentioned behaviors has been noticeably grown on the cyberspace such that can threaten the individuals and even any community. Thus, …


Advancing Cancer Classifcation Through Machine Learning Analysis Of Rna-Seq Gene Expression Data, Emil Agbemade, Amina Issoufou Anaroua, Dimitri Bamba Jan 2024

Advancing Cancer Classifcation Through Machine Learning Analysis Of Rna-Seq Gene Expression Data, Emil Agbemade, Amina Issoufou Anaroua, Dimitri Bamba

Data Science and Data Mining

This study delves into the classifcation of various cancer types using the RNA-Seq (HiSeq) PANCAN dataset from the UCI Machine Learning Repository, which encompasses a rich collection of gene expression data across multiple tumor samples. To improve cancer diagnosis and treatment, our methodology confronts the challenges inherent in high-dimensional datasets, such as the Hughes Effect and the Curse of Dimensionality, through innovative feature selection methods and machine learning approaches. A key component of our strategy includes the use of tree-based algorithms, particularly Random Forest, to refine the dataset to seventy genes of utmost relevance for tumor classifcation, and the application …


Xgboost Hyperberd Model Using Steam Platform, Yuh-Haur Chen Jan 2024

Xgboost Hyperberd Model Using Steam Platform, Yuh-Haur Chen

Data Science and Data Mining

This project investigates game pricing strategies in the Steam market using an XGBoost model, drawing motivation from Professor Xie's lecture, and presenting findings through a density plot that delineates two primary pricing strategies. A free-to-play approach, indicated by a significant hot spot, is adopted by developers focusing on post-purchase revenues through DLC, aesthetic purchases, and in-game transactions. This sailing strategy includes community-centric developers aiming to distribute their games for player engagement rather than profit.

The project illustrates the effectiveness of advanced modeling techniques in handling complex datasets, with significant predictive accuracy reflected by a reduced MSE from 0.3472 to 0.1397. …


Learning Optimal Inter-Class Margin Adaptively For Few-Shot Class-Incremental Learning Via Neural Collapse-Based Meta-Learning, Hang Ran, Weijun Li, Lusi Li, Songsong Tian, Xin Ning, Prayag Tiwari Jan 2024

Learning Optimal Inter-Class Margin Adaptively For Few-Shot Class-Incremental Learning Via Neural Collapse-Based Meta-Learning, Hang Ran, Weijun Li, Lusi Li, Songsong Tian, Xin Ning, Prayag Tiwari

Computer Science Faculty Publications

Few-Shot Class-Incremental Learning (FSCIL) aims to learn new classes incrementally with a limited number of samples per class. It faces issues of forgetting previously learned classes and overfitting on few-shot classes. An efficient strategy is to learn features that are discriminative in both base and incremental sessions. Current methods improve discriminability by manually designing inter-class margins based on empirical observations, which can be suboptimal. The emerging Neural Collapse (NC) theory provides a theoretically optimal inter-class margin for classification, serving as a basis for adaptively computing the margin. Yet, it is designed for closed, balanced data, not for sequential or few-shot …


Osfs-Vague: Online Streaming Feature Selection Algorithm Based On A Vague Set, Jie Yang, Zhijun Wang, Guoyin Wang, Yanmin Liu, Yi He, Di Wu Jan 2024

Osfs-Vague: Online Streaming Feature Selection Algorithm Based On A Vague Set, Jie Yang, Zhijun Wang, Guoyin Wang, Yanmin Liu, Yi He, Di Wu

Computer Science Faculty Publications

Online streaming feature selection (OSFS), as an online learning manner to handle streaming features, is critical in addressing high-dimensional data. In real big data-related applications, the patterns and distributions of streaming features constantly change over time due to dynamic data generation environments. However, existing OSFS methods rely on presented and fixed hyperparameters, which undoubtedly lead to poor selection performance when encountering dynamic features. To make up for the existing shortcomings, the authors propose a novel OSFS algorithm based on vague set, named OSFS-Vague. Its main idea is to combine uncertainty and three-way decision theories to improve feature selection from the …


Eluquant: Event-Level Uncertainty Quantification In Deep Inelastic Scattering, Cristiano Fanelli, James Giroux Jan 2024

Eluquant: Event-Level Uncertainty Quantification In Deep Inelastic Scattering, Cristiano Fanelli, James Giroux

Arts & Sciences Articles

We introduce a physics-informed Bayesian neural network with flow-approximated posteriors using multiplicative normalizing flows for detailed uncertainty quantification (UQ) at the physics event-level. Our method is capable of identifying both heteroskedastic aleatoric and epistemic uncertainties, providing granular physical insights. Applied to deep inelastic scattering (DIS) events, our model effectively extracts the kinematic variables x, Q2, and y, matching the performance of recent deep learning regression techniques but with the critical enhancement of event-level UQ. This detailed description of the underlying uncertainty proves invaluable for decision-making, especially in tasks like event filtering. It also allows for the reduction of true inaccuracies …


Data Science In Finance: Challenges And Opportunities, Xianrong Zheng, Elizabeth Gildea, Sheng Chai, Tongxiao Zhang, Shuxi Wang Jan 2024

Data Science In Finance: Challenges And Opportunities, Xianrong Zheng, Elizabeth Gildea, Sheng Chai, Tongxiao Zhang, Shuxi Wang

Information Technology & Decision Sciences Faculty Publications

Data science has become increasingly popular due to emerging technologies, including generative AI, big data, deep learning, etc. It can provide insights from data that are hard to determine from a human perspective. Data science in finance helps to provide more personal and safer experiences for customers and develop cutting-edge solutions for a company. This paper surveys the challenges and opportunities in applying data science to finance. It provides a state-of-the-art review of financial technologies, algorithmic trading, and fraud detection. Also, the paper identifies two research topics. One is how to use generative AI in algorithmic trading. The other is …


Bootstrap Regression For Investigating Macroeconomics Factors Affecting Usa Home Prices, Benedict Kongyir, Emil Agbemade Jan 2024

Bootstrap Regression For Investigating Macroeconomics Factors Affecting Usa Home Prices, Benedict Kongyir, Emil Agbemade

Data Science and Data Mining

This study investigates the impact of macroeconomic indicators on US home prices, underscoring the importance of understanding these dynamics due to their signifcant socioeconomic consequences. Utilizing a dataset from Kaggle, originally collected by FRED, the research examines variables like the Consumer Price Index, Population, Unemployment, GDP, Stock Prices, Income, and Mortgage Rate to discern their efect on housing market fuctuations. The analysis identifes multicollinearity among predictors, necessitating a shift from traditional multiple linear regression to a more robust bootstrap regression method due to violations of parametric assumptions. Key fndings reveal that Real Disposable Income is a signifcant predictor of home …


Optimizing Ai With Advanced Data Structuring: A Comparative Analysis Of K-Means And Gmm Clustering Techniques, Amir Alipour Yengejeh Jan 2024

Optimizing Ai With Advanced Data Structuring: A Comparative Analysis Of K-Means And Gmm Clustering Techniques, Amir Alipour Yengejeh

Data Science and Data Mining

This study presents a detailed comparison of Kmeans and Gaussian Mixture Model (GMM) clustering algorithms, illustrating their unique capabilities and limitations across various synthetic datasets. By utilizing metrics such as the Adjusted Rand Index (ARI) and Normalized Mutual Information (NMI), the research provides nuanced insights into how these algorithms handle datasets with varying structures and complexities. For instance, while both K-means and GMM show robust performance on well-separated clusters, GMM demonstrates a distinct advantage in scenarios with overlapping clusters or unbalanced data distributions. Conversely, K-means excels in identifying clear, distinct groupings, highlighting its utility in simpler clustering contexts. This study …


Methods That Support The Validation Of Agent-Based Models: An Overview And Discussion, Andrew Collins, Matthew Koehler, Christopher Lynch Jan 2024

Methods That Support The Validation Of Agent-Based Models: An Overview And Discussion, Andrew Collins, Matthew Koehler, Christopher Lynch

Engineering Management & Systems Engineering Faculty Publications

Validation is the process of determining if a model adequately represents the system under study for the model’s intended purpose. Validation is a critical component in building the credibility of a simulation model with its end-users. Effectively conducting validation can be a daunting task for both novice and experienced simulation developers. Further compounding the difficult task of conducting validation is that there is no universally accepted approach for assessing a simulation. These challenges are particularly relevant to the paradigm of Agent-Based Modeling and Simulation (ABMS) because of the complexity found in these models’ mechanisms and in the real-world situations they …


Mhair: A Dataset Of Audio-Image Representations For Multimodal Human Actions, Muhammad Bilal Shaikh, Douglas Chai, Syed M. S. Islam, Naveed Akhtar Jan 2024

Mhair: A Dataset Of Audio-Image Representations For Multimodal Human Actions, Muhammad Bilal Shaikh, Douglas Chai, Syed M. S. Islam, Naveed Akhtar

Research outputs 2022 to 2026

Audio-image representations for a multimodal human action (MHAiR) dataset contains six different image representations of the audio signals that capture the temporal dynamics of the actions in a very compact and informative way. The dataset was extracted from the audio recordings which were captured from an existing video dataset, i.e., UCF101. Each data sample captured a duration of approximately 10 s long, and the overall dataset was split into 4893 training samples and 1944 testing samples. The resulting feature sequences were then converted into images, which can be used for human action recognition and other related tasks. These images can …


Tiny Machine Learning For Underwater Image Enhancement: Pruning And Quantizaition Approach, Dr Khaled Nagaty, The British University In Egypt, Andreas Pester Dr Dec 2023

Tiny Machine Learning For Underwater Image Enhancement: Pruning And Quantizaition Approach, Dr Khaled Nagaty, The British University In Egypt, Andreas Pester Dr

Computer Science

Many people have expressed an interest in underwater image processing in a variety of fields, including underwater vehicle control, archaeology, marine biological studies, etc. Underwater exploration is becoming an increasingly important element of our lives, with applications ranging from underwater marine and creature research to pipeline and communication logistics, military use, touristic and entertainment use. Underwater images suffer from poor visibility, distortion, and poor quality for a variety of causes, including light propagation. The major issue arises when these images must be captured at depths greater than 500 feet and artificial lighting needs to be provided. Efficient algorithms and models …


Unmasking Shadows: Unraveling Crime Patterns In Nyc's Boroughs, Jack Hachicho, Muhammad Hassan Butt Dec 2023

Unmasking Shadows: Unraveling Crime Patterns In Nyc's Boroughs, Jack Hachicho, Muhammad Hassan Butt

Publications and Research

New York City's crime dynamics have been on the rise for decades. Brooklyn and The Bronx have been disproportionately affected. This research aims to understand the crime landscape in these boroughs to formulate effective policies. Using crime data from official sources, statistical analyses, and data visualizations, the study identifies patterns and trends. The data encompasses over 400,000 reported incidents collected over the past 10 years, meticulously categorized by borough, crime type, and demographic information. Brooklyn has the highest overall crime rate, followed by The Bronx. Most shooting victims are Black. This highlights the need for holistic community programs to address …


Climate Change Impact On Bridge Scour Risk In Ny State: A Gis-Based Risk Analysis Model, Muhammad Hassan Butt Dec 2023

Climate Change Impact On Bridge Scour Risk In Ny State: A Gis-Based Risk Analysis Model, Muhammad Hassan Butt

Publications and Research

Bridge scour, the primary cause of bridge failure in the United States, escalates post-severe storms, necessitating effective mitigation. This study employs a GIS-based risk analysis model to assess climate change's impact on bridge scour and associated risks in New York State. Data from the National Bridge Inventory, climate hazard maps, and geospatial data are integrated.