Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Data Science

Series

Institution
Keyword
Publication Year
Publication
File Type

Articles 1 - 30 of 556

Full-Text Articles in Physical Sciences and Mathematics

The Budget Proposal As A Constructive Collections Engagement Tool And Practice, Evan Rusch, Pat Lienemann, Heidi J. Southworth, Nat Gustafson-Sundell Jun 2024

The Budget Proposal As A Constructive Collections Engagement Tool And Practice, Evan Rusch, Pat Lienemann, Heidi J. Southworth, Nat Gustafson-Sundell

Library Services Publications

At Minnesota State University Mankato (MNSU), our story might sound familiar. After more than a decade of flat or decreased budgets, we have cancelled hundreds of journal subscriptions and numerous journal packages. We do occasionally add journals, but only by cancelling others. We often say we are “managing library decline,” and our primary objective is to cut cleanly and accurately, so that we can continue to support accreditation and an evolving curriculum. Over the years, we have developed various tools to guide our cuts and we have demonstrated these at NASIG and elsewhere. These tools have also served for collection …


Cyberbullying Detection On Twitter Data Using Machine Learning Classifiers, Pradip Dhakal May 2024

Cyberbullying Detection On Twitter Data Using Machine Learning Classifiers, Pradip Dhakal

Data Science and Data Mining

This study compares some of the popular machine learning techniques like Logistic Regression, Multinomial Naive Bayes, K-Nearest Neighbor, and Extreme Gradient Boosting to classify the tweets into three different categories: cyberbullying based on religion, cyberbullying based on ethnicity, or no cyberbullying. First, various data-cleaning approaches are used to clean the tweet data. After the data is clean and ready, the word embedding techniques, such as a bag of words and term frequency-Inverse document frequency, are used to convert the words into mathematical vectors. Finally, the model will be fitted using the combination of the above-mentioned word embedding techniques and machine …


Internet-Based Data Platforms Re-Define The Distributions Of Some Large Crabronid Wasps In Arkansas (Hymenoptera: Crabronidae), David E. Bowles May 2024

Internet-Based Data Platforms Re-Define The Distributions Of Some Large Crabronid Wasps In Arkansas (Hymenoptera: Crabronidae), David E. Bowles

Insecta Mundi

The geographic distributions of three large wasps, Sphecius speciosus (Drury), Stictia carolina Fabricius, and Stizus brevipennis Walsh (Hymenoptera: Crabronidae), occurring in Arkansas are defined using museum specimens and three internet-based data platforms. The internet-based data platforms generally provided more county location records than museum records. Using data from internet sources for easily identified species can better serve to illustrate the known distributions for some species thus making for a powerful tool elucidating distributional patterns and conservation planning.

ZooBank registration. urn:lsid:zoobank.org:pub:DCAE9192-1765-40CD-952B-0A094F413991


Detecting Drifts In Data Streams Using Kullback-Leibler (Kl) Divergence Measure For Data Engineering Applications, Jeomoan Francis Kurian, Mohamed Allali May 2024

Detecting Drifts In Data Streams Using Kullback-Leibler (Kl) Divergence Measure For Data Engineering Applications, Jeomoan Francis Kurian, Mohamed Allali

Engineering Faculty Articles and Research

The exponential growth of data coupled with the widespread application of artificial intelligence(AI) presents organizations with challenges in upholding data accuracy, especially within data engineering functions. While the Extraction, Transformation, and Loading process addresses error-free data ingestion, validating the content within data streams remains a challenge. Prompt detection and remediation of data issues are crucial, especially in automated analytical environments driven by AI. To address these issues, this study focuses on detecting drifts in data distributions and divergence within data fields processed from different sample populations. Using a hypothetical banking scenario, we illustrate the impact of data drift on automated …


Data Analysis Project For Preferred Credit Inc., Emily Smith, Greta Nesbit, Jack Simonet, Ignacio Sanchez-Romero May 2024

Data Analysis Project For Preferred Credit Inc., Emily Smith, Greta Nesbit, Jack Simonet, Ignacio Sanchez-Romero

Celebrating Scholarship and Creativity Day (2018-)

This project focuses on transforming real data within PCI's operations into valuable insights through an approach of coding, data cleaning, and visualization. By leveraging advanced techniques, the project aims to uncover key trends and create visually compelling representations to aid decision-making within the company. The outcome will allow PCI stakeholders the ability to extract valuable insights, optimize processes, and drive initiatives for growth and competitive advantage in the finance industry.


Computational Linguistics And Multilingualism: A Comparative Analysis With Spanish And English Data, Evelyn Lawrie May 2024

Computational Linguistics And Multilingualism: A Comparative Analysis With Spanish And English Data, Evelyn Lawrie

Student Scholar Symposium Abstracts and Posters

Computational linguistics is an increasingly ubiquitous field, serving as the basis for artificial intelligence and machine translation. It aims to analyze the syntax and semantics of individual words and phrases. While there have been in-depth advancements in computational linguistics strategies for the English language, others have not been developed as thoroughly. This lack of emphasis on multilingualism has contributed to the disappearance of Hispanic perspectives in the digital world. Especially those of indigenous heritage, as the decline of many indigenous languages has been exacerbated by the lack of digital translation services. Sentiment analysis is a branch of computational linguistics that …


Develop An Interactive Python Dashboard For Analyzing Ezproxy Logs, Andy Huff, Matthew Roth, Weiling Liu Apr 2024

Develop An Interactive Python Dashboard For Analyzing Ezproxy Logs, Andy Huff, Matthew Roth, Weiling Liu

Faculty Scholarship

This paper describes the development of an interactive dashboard in Python with EZproxy log data. Hopefully, this dashboard will help improve the evidence-based decision-making process in electronic resources management and explore the impact of library use.


Cradle Explorer: Casfer Interactive Platform For Data And Model Visualization, Olatunde D. Akanbi, Vibha S. Mandayam, Haiping Ai, Arafath Nihar, Erika I. Barcelos, Laura S. Bruckman, Jeffrey Yarus, Yinghui Wu, Huichun (Judy) Zhang, Roger H. French Apr 2024

Cradle Explorer: Casfer Interactive Platform For Data And Model Visualization, Olatunde D. Akanbi, Vibha S. Mandayam, Haiping Ai, Arafath Nihar, Erika I. Barcelos, Laura S. Bruckman, Jeffrey Yarus, Yinghui Wu, Huichun (Judy) Zhang, Roger H. French

Student Scholarship

No abstract provided.


Data Engineering: Building Software Efficiency In Medium To Large Organizations, Alessandro De La Torre Apr 2024

Data Engineering: Building Software Efficiency In Medium To Large Organizations, Alessandro De La Torre

Whittier Scholars Program

The introduction of PoetHQ, a mobile application, offers an economical strategy for colleges, potentially ushering in significant cost savings. These savings could be redirected towards enhancing academic programs and services, enriching the educational landscape for students. PoetHQ aims to democratize access to crucial software, effectively removing financial barriers and facilitating a richer educational experience. By providing an efficient software solution that reduces organizational overhead while maximizing accessibility for students, the project highlights the essential role of equitable education and resource optimization within academic institutions.


Urinalysis Test Data Analysis And Prediction, Nikhil Mhatre Apr 2024

Urinalysis Test Data Analysis And Prediction, Nikhil Mhatre

2024 Datathon Challenges

OUTLIERS Team submission to the Urinalysis Test Results Timed Challenge

Researched various algorithms like boosting and random forest. We learned a lot about their strength and weaknesses, and used these algorithms accordingly to solve the issues faced in the dataset.


Artificial Intelligence Could Probably Write This Essay Better Than Me, Claire Martino Apr 2024

Artificial Intelligence Could Probably Write This Essay Better Than Me, Claire Martino

Augustana Center for the Study of Ethics Essay Contest

No abstract provided.


Transcriptional Dynamics During Rhodococcus Erythropolis Infection With Phage Wc1, Dana Willner, Sudip Paudel, Andrew D. Halleran, Grace E. Solini, Veronica Gray, Margaret Saha Apr 2024

Transcriptional Dynamics During Rhodococcus Erythropolis Infection With Phage Wc1, Dana Willner, Sudip Paudel, Andrew D. Halleran, Grace E. Solini, Veronica Gray, Margaret Saha

Arts & Sciences Articles

Background

Belonging to the Actinobacteria phylum, members of the Rhodococcus genus thrive in soil, water, and even intracellularly. While most species are non-pathogenic, several cause respiratory disease in animals and, more rarely, in humans. Over 100 phages that infect Rhodococcus species have been isolated but despite their importance for Rhodococcus ecology and biotechnology applications, little is known regarding the molecular genetic interactions between phage and host during infection. To address this need, we report RNA-Seq analysis of a novel Rhodococcus erythopolis phage, WC1, analyzing both the phage and host transcriptome at various stages throughout the infection process.

Results

By five …


Health And Healthcare: Designing For The Social Determinants Of Health And Blue Zones In North Nashville, Rebecca Tonguis, Honor Thomas, Olivia Hobbs Apr 2024

Health And Healthcare: Designing For The Social Determinants Of Health And Blue Zones In North Nashville, Rebecca Tonguis, Honor Thomas, Olivia Hobbs

Belmont University Research Symposium (BURS)

Owned by North Nashville’s First Community Church, a now empty site in the Osage-North Fisk neighborhood of North Nashville has been identified as a potential site for a new location of The Store, in addition to a community-centric architectural development based on the social determinants of health and informed by the principles behind Blue Zones, the locations with the highest lifespans in the world. Opened by Brad Paisley and Kimberly Williams-Paisley, The Store is a free grocery store that “allow[s] people to shop for their basic needs in a way that protects dignity and fosters hope”, for which North Nashville …


Demographic Data Analysis For Measuring Economic Impact Of The Branch Of Nashville, Tessa Pendleton, Annie Wardroup, Nicole Speyrer, Kimberly Amaya Hernandez Apr 2024

Demographic Data Analysis For Measuring Economic Impact Of The Branch Of Nashville, Tessa Pendleton, Annie Wardroup, Nicole Speyrer, Kimberly Amaya Hernandez

Belmont University Research Symposium (BURS)

As part of the Global Honors Scholars Collaborative, researchers aggregated data from The Belmont Data Collaborative to analyze the three primary ZIP codes (37211, 37013, 37217) served by The Branch of Nashville. These communities include immigrant and refugee populations, whom The Branch supports through its food bank, English classes, and further comprehensive care. Future program development will rely on the analysis of the current client base and eventual assessment of The Branch’s economic impact on the surrounding community. The goal of this research for The Branch of Nashville is twofold: (1) analyze the existing demographics within the above ZIP codes …


Anomaly Detection On Small Wind Turbine Blades Using Deep Learning Algorithms, Bridger Altice, Edwin Nazario, Mason Davis, Mohammad Shekaramiz, Todd K. Moon, Mohammad A. S. Masoum Feb 2024

Anomaly Detection On Small Wind Turbine Blades Using Deep Learning Algorithms, Bridger Altice, Edwin Nazario, Mason Davis, Mohammad Shekaramiz, Todd K. Moon, Mohammad A. S. Masoum

Electrical and Computer Engineering Faculty Publications

Wind turbine blade maintenance is expensive, dangerous, time-consuming, and prone to misdiagnosis. A potential solution to aid preventative maintenance is using deep learning and drones for inspection and early fault detection. In this research, five base deep learning architectures are investigated for anomaly detection on wind turbine blades, including Xception, Resnet-50, AlexNet, and VGG-19, along with a custom convolutional neural network. For further analysis, transfer learning approaches were also proposed and developed, utilizing these architectures as the feature extraction layers. In order to investigate model performance, a new dataset containing 6000 RGB images was created, making use of indoor and …


Artificial Intelligence For The Electron Ion Collider (Ai4eic), C. Allaire, ..., Cristiano Fanelli, James Giroux, Joey Niestroy, Justin R. Stevens, Patrick Stone, L. Suarez, K. Suresh, Eric Walter, Et Al. Feb 2024

Artificial Intelligence For The Electron Ion Collider (Ai4eic), C. Allaire, ..., Cristiano Fanelli, James Giroux, Joey Niestroy, Justin R. Stevens, Patrick Stone, L. Suarez, K. Suresh, Eric Walter, Et Al.

Arts & Sciences Articles

The Electron-Ion Collider (EIC), a state-of-the-art facility for studying the strong force, is expected to begin commissioning its first experiments in 2028. This is an opportune time for artificial intelligence (AI) to be included from the start at this facility and in all phases that lead up to the experiments. The second annual workshop organized by the AI4EIC working group, which recently took place, centered on exploring all current and prospective application areas of AI for the EIC. This workshop is not only beneficial for the EIC, but also provides valuable insights for the newly established ePIC collaboration at EIC. …


Henderson Named One Of The Most Influential People In Legal Education, James Owsley Boyd Jan 2024

Henderson Named One Of The Most Influential People In Legal Education, James Owsley Boyd

Keep Up With the Latest News from the Law School (blog)

Indiana University Maurer School of Law Professor Bill Henderson has once again been recognized as one of the most influential people in legal education, but he’s not the only one with ties to the Law School on this year’s list.

The National Jurist ranked Henderson #18 on its list. Kellye Testy, a 1991 alumna of the Law School and president and CEO of the Law School Admission Council, is ranked second.


Predicting Superconducting Critical Temperature Using Regression Analysis, Roland Fiagbe Jan 2024

Predicting Superconducting Critical Temperature Using Regression Analysis, Roland Fiagbe

Data Science and Data Mining

This project estimates a regression model to predict the superconducting critical temperature based on variables extracted from the superconductor’s chemical formula. The regression model along with the stepwise variable selection gives a reasonable and good predictive model with a lower prediction error (MSE). Variables extracted based on atomic radius, valence, atomic mass and thermal conductivity appeared to have the most contribution to the predictive model.


A Holistic Approach To Performance Prediction In Collegiate Athletics: Player, Team, And Conference Perspectives, Christopher Taber, S. Sharma, Mehul S. Raval, Samah Senbel, Allison Keefe, Jui Shah, Emma Patterson, Julie K. Nolan, N.S. Artan, Tolga Kaya Jan 2024

A Holistic Approach To Performance Prediction In Collegiate Athletics: Player, Team, And Conference Perspectives, Christopher Taber, S. Sharma, Mehul S. Raval, Samah Senbel, Allison Keefe, Jui Shah, Emma Patterson, Julie K. Nolan, N.S. Artan, Tolga Kaya

Exercise Science Faculty Publications

Predictive sports data analytics can be revolutionary for sports performance. Existing literature discusses players' or teams' performance, independently or in tandem. Using Machine Learning (ML), this paper aims to holistically evaluate player-, team-, and conference (season)-level performances in Division-1 Women's basketball. The players were monitored and tested through a full competitive year. The performance was quantified at the player level using the reactive strength index modified (RSImod), at the team level by the game score (GS) metric, and finally at the conference level through Player Efficiency Rating (PER). The data includes parameters from training, subjective stress, sleep, and recovery (WHOOP …


Advancing Cancer Classifcation Through Machine Learning Analysis Of Rna-Seq Gene Expression Data, Emil Agbemade, Amina Issoufou Anaroua, Dimitri Bamba Jan 2024

Advancing Cancer Classifcation Through Machine Learning Analysis Of Rna-Seq Gene Expression Data, Emil Agbemade, Amina Issoufou Anaroua, Dimitri Bamba

Data Science and Data Mining

This study delves into the classifcation of various cancer types using the RNA-Seq (HiSeq) PANCAN dataset from the UCI Machine Learning Repository, which encompasses a rich collection of gene expression data across multiple tumor samples. To improve cancer diagnosis and treatment, our methodology confronts the challenges inherent in high-dimensional datasets, such as the Hughes Effect and the Curse of Dimensionality, through innovative feature selection methods and machine learning approaches. A key component of our strategy includes the use of tree-based algorithms, particularly Random Forest, to refine the dataset to seventy genes of utmost relevance for tumor classifcation, and the application …


Eluquant: Event-Level Uncertainty Quantification In Deep Inelastic Scattering, Cristiano Fanelli, James Giroux Jan 2024

Eluquant: Event-Level Uncertainty Quantification In Deep Inelastic Scattering, Cristiano Fanelli, James Giroux

Arts & Sciences Articles

We introduce a physics-informed Bayesian neural network with flow-approximated posteriors using multiplicative normalizing flows for detailed uncertainty quantification (UQ) at the physics event-level. Our method is capable of identifying both heteroskedastic aleatoric and epistemic uncertainties, providing granular physical insights. Applied to deep inelastic scattering (DIS) events, our model effectively extracts the kinematic variables x, Q2, and y, matching the performance of recent deep learning regression techniques but with the critical enhancement of event-level UQ. This detailed description of the underlying uncertainty proves invaluable for decision-making, especially in tasks like event filtering. It also allows for the reduction of true inaccuracies …


Methods That Support The Validation Of Agent-Based Models: An Overview And Discussion, Andrew Collins, Matthew Koehler, Christopher Lynch Jan 2024

Methods That Support The Validation Of Agent-Based Models: An Overview And Discussion, Andrew Collins, Matthew Koehler, Christopher Lynch

Engineering Management & Systems Engineering Faculty Publications

Validation is the process of determining if a model adequately represents the system under study for the model’s intended purpose. Validation is a critical component in building the credibility of a simulation model with its end-users. Effectively conducting validation can be a daunting task for both novice and experienced simulation developers. Further compounding the difficult task of conducting validation is that there is no universally accepted approach for assessing a simulation. These challenges are particularly relevant to the paradigm of Agent-Based Modeling and Simulation (ABMS) because of the complexity found in these models’ mechanisms and in the real-world situations they …


Bootstrap Regression For Investigating Macroeconomics Factors Affecting Usa Home Prices, Benedict Kongyir, Emil Agbemade Jan 2024

Bootstrap Regression For Investigating Macroeconomics Factors Affecting Usa Home Prices, Benedict Kongyir, Emil Agbemade

Data Science and Data Mining

This study investigates the impact of macroeconomic indicators on US home prices, underscoring the importance of understanding these dynamics due to their signifcant socioeconomic consequences. Utilizing a dataset from Kaggle, originally collected by FRED, the research examines variables like the Consumer Price Index, Population, Unemployment, GDP, Stock Prices, Income, and Mortgage Rate to discern their efect on housing market fuctuations. The analysis identifes multicollinearity among predictors, necessitating a shift from traditional multiple linear regression to a more robust bootstrap regression method due to violations of parametric assumptions. Key fndings reveal that Real Disposable Income is a signifcant predictor of home …


Learning Optimal Inter-Class Margin Adaptively For Few-Shot Class-Incremental Learning Via Neural Collapse-Based Meta-Learning, Hang Ran, Weijun Li, Lusi Li, Songsong Tian, Xin Ning, Prayag Tiwari Jan 2024

Learning Optimal Inter-Class Margin Adaptively For Few-Shot Class-Incremental Learning Via Neural Collapse-Based Meta-Learning, Hang Ran, Weijun Li, Lusi Li, Songsong Tian, Xin Ning, Prayag Tiwari

Computer Science Faculty Publications

Few-Shot Class-Incremental Learning (FSCIL) aims to learn new classes incrementally with a limited number of samples per class. It faces issues of forgetting previously learned classes and overfitting on few-shot classes. An efficient strategy is to learn features that are discriminative in both base and incremental sessions. Current methods improve discriminability by manually designing inter-class margins based on empirical observations, which can be suboptimal. The emerging Neural Collapse (NC) theory provides a theoretically optimal inter-class margin for classification, serving as a basis for adaptively computing the margin. Yet, it is designed for closed, balanced data, not for sequential or few-shot …


Osfs-Vague: Online Streaming Feature Selection Algorithm Based On A Vague Set, Jie Yang, Zhijun Wang, Guoyin Wang, Yanmin Liu, Yi He, Di Wu Jan 2024

Osfs-Vague: Online Streaming Feature Selection Algorithm Based On A Vague Set, Jie Yang, Zhijun Wang, Guoyin Wang, Yanmin Liu, Yi He, Di Wu

Computer Science Faculty Publications

Online streaming feature selection (OSFS), as an online learning manner to handle streaming features, is critical in addressing high-dimensional data. In real big data-related applications, the patterns and distributions of streaming features constantly change over time due to dynamic data generation environments. However, existing OSFS methods rely on presented and fixed hyperparameters, which undoubtedly lead to poor selection performance when encountering dynamic features. To make up for the existing shortcomings, the authors propose a novel OSFS algorithm based on vague set, named OSFS-Vague. Its main idea is to combine uncertainty and three-way decision theories to improve feature selection from the …


Quantification Of Landside Congestion In Ports: An Analysis Based On Gps Data, Kumushini Thennakoon, Namal Bandaranayake, Senevi Kiridena, Asela K. Kulatunga Jan 2024

Quantification Of Landside Congestion In Ports: An Analysis Based On Gps Data, Kumushini Thennakoon, Namal Bandaranayake, Senevi Kiridena, Asela K. Kulatunga

Computer Science Faculty Publications

Hinterland transport is a critical segment in maritime cross-border logistics, which links the end-users of global supply chains to the maritime segment. Truck-based hinterland transport is known to cause congestion in and around ports. This study aimed to quantify the congestion caused by trucks at the Port of Colombo, which has not been a subject of a systematic study. To this end, the study makes use of GPS data. In addition to revealing heavy congestion within the port, the study also reveals significant variations in congestion during different times of the day with the duration of journeys peaking from 1200hrs …


Xgboost Hyperberd Model Using Steam Platform, Yuh-Haur Chen Jan 2024

Xgboost Hyperberd Model Using Steam Platform, Yuh-Haur Chen

Data Science and Data Mining

This project investigates game pricing strategies in the Steam market using an XGBoost model, drawing motivation from Professor Xie's lecture, and presenting findings through a density plot that delineates two primary pricing strategies. A free-to-play approach, indicated by a significant hot spot, is adopted by developers focusing on post-purchase revenues through DLC, aesthetic purchases, and in-game transactions. This sailing strategy includes community-centric developers aiming to distribute their games for player engagement rather than profit.

The project illustrates the effectiveness of advanced modeling techniques in handling complex datasets, with significant predictive accuracy reflected by a reduced MSE from 0.3472 to 0.1397. …


Combating Cyberbullying On Social Media: A Machine Learning Approach With Text Analysis On Twitter, Amir Alipour Yengejeh Jan 2024

Combating Cyberbullying On Social Media: A Machine Learning Approach With Text Analysis On Twitter, Amir Alipour Yengejeh

Data Science and Data Mining

The popularity of the electronic mobile devices along with social media as well as networking websites have been tremendously increased in the recent year. Most people around the world daily engage in the variety of cyberspace additives. Even though the users can take most advantages of these system such as exchange the idea and information, being sociable, and enjoyments, they might be faced with such adverse behaviors such as toxicity, bullying, extremism, and cruelty. The recent statistics reports that such mentioned behaviors has been noticeably grown on the cyberspace such that can threaten the individuals and even any community. Thus, …


Machine Learning Approaches For Cyberbullying Detection, Roland Fiagbe Jan 2024

Machine Learning Approaches For Cyberbullying Detection, Roland Fiagbe

Data Science and Data Mining

Cyberbullying refers to the act of bullying using electronic means and the internet. In recent years, this act has been identifed to be a major problem among young people and even adults. It can negatively impact one’s emotions and lead to adverse outcomes like depression, anxiety, harassment, and suicide, among others. This has led to the need to employ machine learning techniques to automatically detect cyberbullying and prevent them on various social media platforms. In this study, we want to analyze the combination of some Natural Language Processing (NLP) algorithms (such as Bag-of-Words and TFIDF) with some popular machine learning …


Diagnostic In Neuroimaging: A Comparative Study Of Deep Learning And Traditional Approaches, Amina Issoufou Anaroua Jan 2024

Diagnostic In Neuroimaging: A Comparative Study Of Deep Learning And Traditional Approaches, Amina Issoufou Anaroua

Data Science and Data Mining

In the realm of medical diagnostics, precise classification of brain tumors is pivotal. This study conducts a comprehensive comparative analysis of a Convolutional Neural Network (CNN) against traditional machine learning models, Logistic Regression (LR) and Support Vector Machines (SVM) on a dataset of MRI scans for multi-class brain tumor classification. The CNN, tailored for image recognition, is evaluated alongside LR and SVM, which have established benchmarks in classification tasks. The investigation reveals that the traditional models hold their ground in terms of precision and interpretability, with the SVM, in particular, achieving remarkable accuracy. However, the CNN distinguishes itself by demonstrating …