Open Access. Powered by Scholars. Published by Universities.®

Data Science Commons

Open Access. Powered by Scholars. Published by Universities.®

Discipline
Institution
Keyword
Publication Year
Publication
Publication Type
File Type

Articles 1 - 30 of 1397

Full-Text Articles in Data Science

Transfer Learning In The Era Of Foundational Models: Application To Diagnosis In Rheumatology, Prashant Shekhar Feb 2024

Transfer Learning In The Era Of Foundational Models: Application To Diagnosis In Rheumatology, Prashant Shekhar

Math Department Colloquium Series

Problems with current synovitis grading procedures

  • There has been a lack of reliability in grading these images in the medical community due to a lack of universally accepted diagnostic criteria [Momtazmanesh et al., 2022]
  • The human/machine variability creates an additional challenge in an efficient automated scoring system [Ranganath et al., 2022]
  • There is a lack of consistency between doctors in grading these images [Momtazmanesh et al., 2022]


Session 8: Machine Learning Based Behavior Of Non-Opec Global Supply In Crude Oil Price Determinism, Mofe Jeje Feb 2024

Session 8: Machine Learning Based Behavior Of Non-Opec Global Supply In Crude Oil Price Determinism, Mofe Jeje

SDSU Data Science Symposium

Abstract

While studies on global oil price variability, occasioned by OPEC crude oil supply, is well documented in energy literature; the impact assessment of non-OPEC global oil supply on price variability, on the other hand, has not received commensurate attention. Given this gap, the primary objective of this study, therefore, is to estimate the magnitude of oil price determinism that is explained by the share of non-OPEC’s global crude oil supply. Using secondary sources of data collection method, data for target variable will be collected from the US Federal Reserve, as it relates to annual crude oil price variability, while …


Principal Component Analysis With Application To Credit Card Data, Eleanor Cain, Semhar Michael, Gary Hatfield Feb 2024

Principal Component Analysis With Application To Credit Card Data, Eleanor Cain, Semhar Michael, Gary Hatfield

SDSU Data Science Symposium

Principal Component Analysis (PCA) is a type of dimension reduction technique used in data analysis to process the data before making a model. In general, dimension reduction allows analysts to make conclusions about large data sets by reducing the number of variables while retaining as much information as possible. Using the numerical variables from a data set, PCA aims to compute a smaller set of uncorrelated variables, called principal components, that account for a majority of the variability from the data. The purpose of this poster is to understand PCA as well as perform PCA on a large sample credit …


Predicting Crop Yield Using Remote Sensing Data, Mary Row, Jung-Han Kimn, Hossein Moradi Feb 2024

Predicting Crop Yield Using Remote Sensing Data, Mary Row, Jung-Han Kimn, Hossein Moradi

SDSU Data Science Symposium

Accurate crop yield predictions can help farmers make adjustments or changes in their farming practices to optimize their harvest. Remote sensing data is an inexpensive approach to collecting massive amounts of data that could be utilized for predicting crop yield. This study employed linear regression and spatial linear models were used to predict soybean yield with data from Landsat 8 OLI. Each model was built using only spectral bands of the satellite, only vegetation indices, and both spectral bands and vegetation indices. All analysis was based on data collected from two fields in South Dakota from the 2019 and 2021 …


Session 6: Model-Based Clustering Analysis On The Spatial-Temporal And Intensity Patterns Of Tornadoes, Yana Melnykov, Yingying Zhang, Rong Zheng Feb 2024

Session 6: Model-Based Clustering Analysis On The Spatial-Temporal And Intensity Patterns Of Tornadoes, Yana Melnykov, Yingying Zhang, Rong Zheng

SDSU Data Science Symposium

Tornadoes are one of the nature’s most violent windstorms that can occur all over the world except Antarctica. Previous scientific efforts were spent on studying this nature hazard from facets such as: genesis, dynamics, detection, forecasting, warning, measuring, and assessing. While we want to model the tornado datasets by using modern sophisticated statistical and computational techniques. The goal of the paper is developing novel finite mixture models and performing clustering analysis on the spatial-temporal and intensity patterns of the tornadoes. To analyze the tornado dataset, we firstly try a Gaussian distribution with the mean vector and variance-covariance matrix represented as …


Clustering Of Patients With Heart Disease, Mukadder Cinar Feb 2024

Clustering Of Patients With Heart Disease, Mukadder Cinar

Dissertations, Theses, and Capstone Projects

Heart disease, a leading cause of mortality worldwide, presents complex challenges in public health due to its varied manifestations. Accurate diagnosis and patient stratification are essential for effective management and improved outcomes. In response, this study employed machine learning techniques to analyze heart disease data obtained from UCI Machine Learning Repository, aiming to enhance patient care through advanced data analysis.

The study began with the application of K-Nearest Neighbors (KNN) classification, which categorized patients into 'Disease' and 'No Disease' groups. This preliminary step provided initial insights into the structure of the dataset. Subsequently, K-means clustering was applied in two rounds, …


What Does One Billion Dollars Look Like?: Visualizing Extreme Wealth, William Mahoney Luckman Feb 2024

What Does One Billion Dollars Look Like?: Visualizing Extreme Wealth, William Mahoney Luckman

Dissertations, Theses, and Capstone Projects

The word “billion” is a mathematical abstraction related to “big,” but it is difficult to understand the vast difference in value between one million and one billion; even harder to understand the vast difference in purchasing power between one billion dollars, and the average U.S. yearly income. Perhaps most difficult to conceive of is what that purchasing power and huge mass of capital translates to in terms of power. This project blends design, text, facts, and figures into an interactive narrative website that helps the user better understand their position in relation to extreme wealth: https://whatdoesonebilliondollarslooklike.website/

The site incorporates …


Making Sense Of Making Parole In New York, Alexandra Mcglinchy Feb 2024

Making Sense Of Making Parole In New York, Alexandra Mcglinchy

Dissertations, Theses, and Capstone Projects

For many individuals incarcerated in New York, the initial step toward freedom begins with an interview with the Board of Parole. This process, however, is frequently a complex and challenging one, characterized by repeated denials and extended incarcerations. The disparity in outcomes – where one individual may receive over 20 denials and another is granted parole on their first attempt – highlights the ambiguity and inconsistency in the parole decision-making process. This project aims to clarify the factors that influence parole decisions by concentrating on measurable variables. These include age, race, duration of sentence served, proportion of sentence served, type …


Modeling Of Covid-19 Clinical Outcomes In Mexico: An Analysis Of Demographic, Clinical, And Chronic Disease Factors, Livia Clarete Feb 2024

Modeling Of Covid-19 Clinical Outcomes In Mexico: An Analysis Of Demographic, Clinical, And Chronic Disease Factors, Livia Clarete

Dissertations, Theses, and Capstone Projects

This study explores COVID-19 clinical outcomes in Mexico, focusing on demographic, clinical, and chronic disease variables to develop predictive models. In the binary classification task, the Ada Boost Classifier distinguishes survivors from non-survivors, with age, sex, ethnicity, and chronic medical conditions influencing outcomes. In multiclass classification, the Gradient Boosting Classifier categorizes patients into outcome groups.

Demographic variables, especially age, are crucial for predicting COVID-19 outcomes for both the binary and multiclass classification tasks. Clinical information about previous conditions, including chronic diseases, also holds relevance, especially diabetes, immunocompromise, and cardiovascular diseases. These insights inform public health measures and healthcare strategies, emphasizing …


The Impact Of Accessible Data On Cyberstalking, Elise Kwan Jan 2024

The Impact Of Accessible Data On Cyberstalking, Elise Kwan

The Journal of Purdue Undergraduate Research

No abstract provided.


Model Selection Through Cross-Validation For Supervised Learning Tasks With Manifold Data, Derek Brown Jan 2024

Model Selection Through Cross-Validation For Supervised Learning Tasks With Manifold Data, Derek Brown

The Journal of Purdue Undergraduate Research

No abstract provided.


Machine Learning Of Big Data: A Gaussian Regression Model To Predict The Spatiotemporal Distribution Of Ground Ozone, Jerry Gu Jan 2024

Machine Learning Of Big Data: A Gaussian Regression Model To Predict The Spatiotemporal Distribution Of Ground Ozone, Jerry Gu

The Journal of Purdue Undergraduate Research

Tracking pollution levels on the ground is important to the environment and public health. One of the pollutants of concern is ozone, which, at high concentrations, can cause respiratory and cardiovascular problems. The National Center for Atmospheric Research (NCAR) has published valuable ozone data obtained from ground-based sensors installed at selected locations. Because it is unfeasible to measure the exact ozone levels everywhere at any time, it would be valuable to predict the temporal-spatial distributions of ozone concentration based on existing data. This would help us better understand the patterns and trends in the data and make better decisions to …


A Computational Profile Of Invasive Lionfish In Belize: A New Insight On A Destructive Species, Joshua E. Balan Jan 2024

A Computational Profile Of Invasive Lionfish In Belize: A New Insight On A Destructive Species, Joshua E. Balan

The Journal of Purdue Undergraduate Research

Since their discovery in the region in 2009, invasive Indonesian-native lionfish have been taking over the Belize Barrier Reef. As a result, populations of local species have dwindled as they are either eaten or outcompeted by the invaders. This has led to devastating losses ecologically and economically; massive industries in the local nations, such as fisheries and tourism, have suffered greatly. Attempting to combat this, local organizations, from nonprofits to ecotourism companies, have been manually spear-hunting them on scuba dives to cull the population. One such company, Reef Conservation Institute (ReefCI), operating out of Tom Owens Caye outside of Placencia, …


Henderson Named One Of The Most Influential People In Legal Education, James Owsley Boyd Jan 2024

Henderson Named One Of The Most Influential People In Legal Education, James Owsley Boyd

Keep Up With the Latest News from the Law School (blog)

Indiana University Maurer School of Law Professor Bill Henderson has once again been recognized as one of the most influential people in legal education, but he’s not the only one with ties to the Law School on this year’s list.

The National Jurist ranked Henderson #18 on its list. Kellye Testy, a 1991 alumna of the Law School and president and CEO of the Law School Admission Council, is ranked second.


Molecular Understanding And Design Of Deep Eutectic Solvents And Proteins Using Computer Simulations And Machine Learning, Usman Lame Abbas Jan 2024

Molecular Understanding And Design Of Deep Eutectic Solvents And Proteins Using Computer Simulations And Machine Learning, Usman Lame Abbas

Theses and Dissertations--Chemical and Materials Engineering

Hydrophobic deep eutectic solvents (DESs) have emerged as excellent extractants. A major challenge is the lack of an efficient tool to discover DES candidates. Currently, the search relies heavily on the researchers’ intuition or a trial-and-error process, which leads to a low success rate or bypassing of promising candidates. DES performance depends on the heterogeneous hydrogen bond environment formed by multiple hydrogen bond donors and acceptors. Understanding this heterogeneous hydrogen bond environment can help develop principles for designing high performance DESs for extraction and other separation applications. This work investigates the structure and dynamics of hydrogen bonds in hydrophobic DESs …


A Bayesian Inversion For Emissions And Export Productivity Across The End-Cretaceous Boundary, Alexander A. Cox Jan 2024

A Bayesian Inversion For Emissions And Export Productivity Across The End-Cretaceous Boundary, Alexander A. Cox

Dartmouth College Master’s Theses

The end-Cretaceous mass extinction was marked by both the Chicxulub impact and the ongoing emplacement of the Deccan Traps flood basalt province. Both of these events perturbed the environment by the emission of climate-active volatiles, primarily CO2 and SO2. To understand the mechanism of extinction, we must disentangle the timing, duration, and intensity of volcanic and meteoritic environmental forcings. In this thesis, we used a parallel Markov chain Monte Carlo approach to invert for the aforementioned volatile emissions, export productivity, and remineralization from 67 to 65 million years ago using the LOSCAR (Long-term Ocean-atmosphere-Sediment CArbon cycle Reservoir) model. The parallel …


A Holistic Approach To Performance Prediction In Collegiate Athletics: Player, Team, And Conference Perspectives, Christopher Taber, S. Sharma, Mehul S. Raval, Samah Senbel, Allison Keefe, Jui Shah, Emma Patterson, Julie K. Nolan, N.S. Artan, Tolga Kaya Jan 2024

A Holistic Approach To Performance Prediction In Collegiate Athletics: Player, Team, And Conference Perspectives, Christopher Taber, S. Sharma, Mehul S. Raval, Samah Senbel, Allison Keefe, Jui Shah, Emma Patterson, Julie K. Nolan, N.S. Artan, Tolga Kaya

Exercise Science Faculty Publications

Predictive sports data analytics can be revolutionary for sports performance. Existing literature discusses players' or teams' performance, independently or in tandem. Using Machine Learning (ML), this paper aims to holistically evaluate player-, team-, and conference (season)-level performances in Division-1 Women's basketball. The players were monitored and tested through a full competitive year. The performance was quantified at the player level using the reactive strength index modified (RSImod), at the team level by the game score (GS) metric, and finally at the conference level through Player Efficiency Rating (PER). The data includes parameters from training, subjective stress, sleep, and recovery (WHOOP …


Combating Cyberbullying On Social Media: A Machine Learning Approach With Text Analysis On Twitter, Amir Alipour Yengejeh Jan 2024

Combating Cyberbullying On Social Media: A Machine Learning Approach With Text Analysis On Twitter, Amir Alipour Yengejeh

Data Science and Data Mining

The popularity of the electronic mobile devices along with social media as well as networking websites have been tremendously increased in the recent year. Most people around the world daily engage in the variety of cyberspace additives. Even though the users can take most advantages of these system such as exchange the idea and information, being sociable, and enjoyments, they might be faced with such adverse behaviors such as toxicity, bullying, extremism, and cruelty. The recent statistics reports that such mentioned behaviors has been noticeably grown on the cyberspace such that can threaten the individuals and even any community. Thus, …


Advancing Cancer Classifcation Through Machine Learning Analysis Of Rna-Seq Gene Expression Data, Emil Agbemade, Amina Issoufou Anaroua, Dimitri Bamba Jan 2024

Advancing Cancer Classifcation Through Machine Learning Analysis Of Rna-Seq Gene Expression Data, Emil Agbemade, Amina Issoufou Anaroua, Dimitri Bamba

Data Science and Data Mining

This study delves into the classifcation of various cancer types using the RNA-Seq (HiSeq) PANCAN dataset from the UCI Machine Learning Repository, which encompasses a rich collection of gene expression data across multiple tumor samples. To improve cancer diagnosis and treatment, our methodology confronts the challenges inherent in high-dimensional datasets, such as the Hughes Effect and the Curse of Dimensionality, through innovative feature selection methods and machine learning approaches. A key component of our strategy includes the use of tree-based algorithms, particularly Random Forest, to refine the dataset to seventy genes of utmost relevance for tumor classifcation, and the application …


Mhair: A Dataset Of Audio-Image Representations For Multimodal Human Actions, Muhammad Bilal Shaikh, Douglas Chai, Syed M. S. Islam, Naveed Akhtar Jan 2024

Mhair: A Dataset Of Audio-Image Representations For Multimodal Human Actions, Muhammad Bilal Shaikh, Douglas Chai, Syed M. S. Islam, Naveed Akhtar

Research outputs 2022 to 2026

Audio-image representations for a multimodal human action (MHAiR) dataset contains six different image representations of the audio signals that capture the temporal dynamics of the actions in a very compact and informative way. The dataset was extracted from the audio recordings which were captured from an existing video dataset, i.e., UCF101. Each data sample captured a duration of approximately 10 s long, and the overall dataset was split into 4893 training samples and 1944 testing samples. The resulting feature sequences were then converted into images, which can be used for human action recognition and other related tasks. These images can …


In Pursuit Of Consumption-Based Forecasting, Charles Chase, Kenneth B. Kahn Jan 2024

In Pursuit Of Consumption-Based Forecasting, Charles Chase, Kenneth B. Kahn

Marketing Faculty Publications

[Introduction] Today's most mature, most sophisticated, best-in-class forecasting is what we call consumption-based forecasting (CBF). In contrast, the least sophisticated companies typically do not forecast at all, but rather set financial targets based on management expectations. Companies beginning to use statistical forecasting techniques usually take a supply-centric orientation, relying on time series techniques applied to shipment and/or order history. The next stage of progression is to incorporate promotions data, economic data, and market data alongside supply-centric data so that regression and other advanced analytics can be used. Companies pursing CBF utilize even more advanced capabilities to capture, examine, and understand …


Towards Algorithmic Justice: Human Centered Approaches To Artificial Intelligence Design To Support Fairness And Mitigate Bias In The Financial Services Sector, Jihyun Kim Jan 2024

Towards Algorithmic Justice: Human Centered Approaches To Artificial Intelligence Design To Support Fairness And Mitigate Bias In The Financial Services Sector, Jihyun Kim

CMC Senior Theses

Artificial Intelligence (AI) has positively transformed the Financial services sector but also introduced AI biases against protected groups, amplifying existing prejudices against marginalized communities. The financial decisions made by biased algorithms could cause life-changing ramifications in applications such as lending and credit scoring. Human Centered AI (HCAI) is an emerging concept where AI systems seek to augment, not replace human abilities while preserving human control to ensure transparency, equity and privacy. The evolving field of HCAI shares a common ground with and can be enhanced by the Human Centered Design principles in that they both put humans, the user, at …


Reducing Food Scarcity: The Benefits Of Urban Farming, S.A. Claudell, Emilio Mejia Dec 2023

Reducing Food Scarcity: The Benefits Of Urban Farming, S.A. Claudell, Emilio Mejia

Journal of Nonprofit Innovation

Urban farming can enhance the lives of communities and help reduce food scarcity. This paper presents a conceptual prototype of an efficient urban farming community that can be scaled for a single apartment building or an entire community across all global geoeconomics regions, including densely populated cities and rural, developing towns and communities. When deployed in coordination with smart crop choices, local farm support, and efficient transportation then the result isn’t just sustainability, but also increasing fresh produce accessibility, optimizing nutritional value, eliminating the use of ‘forever chemicals’, reducing transportation costs, and fostering global environmental benefits.

Imagine Doris, who is …


Tiny Machine Learning For Underwater Image Enhancement: Pruning And Quantizaition Approach, Dr Khaled Nagaty, The British University In Egypt, Andreas Pester Dr Dec 2023

Tiny Machine Learning For Underwater Image Enhancement: Pruning And Quantizaition Approach, Dr Khaled Nagaty, The British University In Egypt, Andreas Pester Dr

Computer Science

Many people have expressed an interest in underwater image processing in a variety of fields, including underwater vehicle control, archaeology, marine biological studies, etc. Underwater exploration is becoming an increasingly important element of our lives, with applications ranging from underwater marine and creature research to pipeline and communication logistics, military use, touristic and entertainment use. Underwater images suffer from poor visibility, distortion, and poor quality for a variety of causes, including light propagation. The major issue arises when these images must be captured at depths greater than 500 feet and artificial lighting needs to be provided. Efficient algorithms and models …


Interpretable Word-Level Sentiment Analysis With Attention-Based Multiple Instance Classification Models, Chenyu Yang Dec 2023

Interpretable Word-Level Sentiment Analysis With Attention-Based Multiple Instance Classification Models, Chenyu Yang

Statistical Science Theses and Dissertations

In this study, our main objective is to tackle the black-box nature of popular machine learning models in sentiment analysis and enhance model interpretability. We aim to gain more insight into the decision-making process of sentiment analysis models, which is often obscure in those complex models. To achieve this goal, we introduce two word-level sentiment analysis models.

The first model is called the attention-based multiple instance classification (AMIC) model. It combines the transparent model structure of multiple instance classification and the self-attention mechanism in deep learning to incorporate the contextual information from documents. As demonstrated by a wine review dataset …


Roadside Lidar Data Processing For Intelligent Transportation System, Md Parvez Mollah Dec 2023

Roadside Lidar Data Processing For Intelligent Transportation System, Md Parvez Mollah

Computer Science ETDs

Roadside LiDAR (Light Detection and Ranging) sensors are recently being explored for Intelligent Transportation System aiming at safer and faster traffic management and vehicular operations. However, massive data volume, occlusion, and limited viewing angles are significant obstacles to the widespread use of roadside LiDARs. In this dissertation, we address three major challenges to enable applications of Intelligent Transportation System through roadside LiDAR data: (i) real-time transmission of the massive point-cloud data from the roadside LiDAR devices to the cloud using 5G network, (ii) mitigating sensor occlusion problem to increase coverage and detect events occurred in occluded regions of a sensor, …


Utilizing Multitask Transfer Learning For Sonographic Rheumatoid Arthritis Synovitis Grading, Jordan Marie Claire Sanders Dec 2023

Utilizing Multitask Transfer Learning For Sonographic Rheumatoid Arthritis Synovitis Grading, Jordan Marie Claire Sanders

Doctoral Dissertations and Master's Theses

Classifying the four sonographic Rheumatoid Arthritis (RA) synovitis grades (Grade 0, Grade 1, Grade 2, and Grade 3) is a difficult problem due to the complexity of the relevant markers. Therefore, the current research proposes a Multitask Transfer Learning (MTL) framework for sonographic RA synovitis grading of Ultrasound (US) images in Brightness mode (B-Mode) and Power Doppler mode.

In the medical community, the lack of reliability of scoring these images has been an issue and reason for concern for doctors and other medical practitioners. The human/machine variability across the acquisition procedure of these US images creates an additional challenge that …


Learning Mortality Risk For Covid-19 Using Machine Learning And Statistical Methods, Shaoshi Zhang Dec 2023

Learning Mortality Risk For Covid-19 Using Machine Learning And Statistical Methods, Shaoshi Zhang

Electronic Thesis and Dissertation Repository

This research investigates the mortality risk of COVID-19 patients across different variant waves, using the data from Centers for Disease Control and Prevention (CDC) websites. By analyzing the available data, including patient medical records, vaccination rates, and hospital capacities, we aim to discern patterns and factors associated with COVID-19-related deaths.

To explore features linked to COVID-19 mortality, we employ different techniques such as Filter, Wrapper, and Embedded methods for feature selection. Furthermore, we apply various machine learning methods, including support vector machines, decision trees, random forests, logistic regression, K-nearest neighbours, na¨ıve Bayes methods, and artificial neural networks, to uncover underlying …


Ohio Recovery Housing: Resident Risk And Outcomes Assessment, Elyjiah Potter, Bivin Sadler Dec 2023

Ohio Recovery Housing: Resident Risk And Outcomes Assessment, Elyjiah Potter, Bivin Sadler

SMU Data Science Review

Addiction and substance abuse disorder is a significant problem in the United States. Over the past two decades, the United States has faced a boom in substance abuse, which has resulted in an increase in death and disruption of families across the nation. The State of Ohio has been particularly hard hit by the crisis, with overdose rates nearly doubling the national average. Established in the mid 1970’s Sober Living Housing is an alcohol and substance use recovery model emphasizing personal responsibility, sober living, and community support. This model has been adopted by the Ohio Recovery Housing organization, which seeks …


Deep Learning Image Analysis To Isolate And Characterize Different Stages Of S-Phase In Human Cells, Kevin A. Boyd, Rudranil Mitra, John Santerre, Christopher L. Sansam Dec 2023

Deep Learning Image Analysis To Isolate And Characterize Different Stages Of S-Phase In Human Cells, Kevin A. Boyd, Rudranil Mitra, John Santerre, Christopher L. Sansam

SMU Data Science Review

Abstract. This research used deep learning for image analysis by isolating and characterizing distinct DNA replication patterns in human cells. By leveraging high-resolution microscopy images of multiple cells stained with 5-Ethynyl-2′-deoxyuridine (EdU), a replication marker, this analysis utilized Convolutional Neural Networks (CNNs) to perform image segmentation and to provide robust and reliable classification results. First multiple cells in a field of focus were identified using a pretrained CNN called Cellpose. After identifying the location of each cell in the image a python script was created to crop out each cell into individual .tif files. After careful annotation, a CNN was …