Open Access. Powered by Scholars. Published by Universities.®

Categorical Data Analysis Commons

Open Access. Powered by Scholars. Published by Universities.®

413 Full-Text Articles 611 Authors 224,200 Downloads 94 Institutions

All Articles in Categorical Data Analysis

Faceted Search

413 full-text articles. Page 1 of 16.

Payments In Gambling, Kasra Ghaharian 2023 University of Nevada, Las Vegas

Payments In Gambling, Kasra Ghaharian

International Conference on Gambling & Risk Taking

A considerable body of gambling-related research has addressed the task of segmenting a sample population of gamblers into homogenous sub-groups. Typically, “static” features are used as model inputs for cluster analysis, where variables are aggregated for each individual over a specified period of time; for example, the total amount wagered per gambler over the course of a study period. Engineering features in this way fails to capture the intricacies of a gambler’s behavior over time. Recent works have begun to address this limitation by using time-series data as model inputs and by employing trajectory analysis. While these methods incorporate the …


Open Data Indicates That Collegedale Could Be A Bluezone, Tristan Deschamps, Alva Johnson 2023 Southern Adventist University

Open Data Indicates That Collegedale Could Be A Bluezone, Tristan Deschamps, Alva Johnson

Campus Research Day

A blue zone is an indicator of exceptional health in a community. Adventists have a blue zone community in Loma Linda, but there has been little research into other Adventist populated areas that could be blue zones. Therefore, our goal is to show that open data suggests that a blue zone may exist near Southern Adventist University, specifically in Collegedale. This data has been gathered from different federal sources, including, the CDC, the US Census Bureau, the Tennessee Department of Health, official state records, and federal documents that are available to the public.


Fraud Pattern Detection For Nft Markets, Andrew Leppla, Jorge Olmos, Jaideep Lamba 2023 Southern Methodist University

Fraud Pattern Detection For Nft Markets, Andrew Leppla, Jorge Olmos, Jaideep Lamba

SMU Data Science Review

Non-Fungible Tokens (NFTs) enable ownership and transfer of digital assets using blockchain technology. As a relatively new financial asset class, NFTs lack robust oversight and regulations. These conditions create an environment that is susceptible to fraudulent activity and market manipulation schemes. This study examines the buyer-seller network transactional data from some of the most popular NFT marketplaces (e.g., AtomicHub, OpenSea) to identify and predict fraudulent activity. To accomplish this goal multiple features such as price, volume, and network metrics were extracted from NFT transactional data. These were fed into a Multiple-Scale Convolutional Neural Network that predicts suspected fraudulent activity based …


Analyzing Relationships With Machine Learning, Oscar Ko 2023 The Graduate Center, City University of New York

Analyzing Relationships With Machine Learning, Oscar Ko

Dissertations, Theses, and Capstone Projects

Procedurally, this project aims to take a dataset, analyze it, and offer insights to the audience in an easy-to-digest format. Conceptually, this project will seek to explore questions like: “Do couples that meet through online dating or dating apps have higher or lower quality relationships?”, “Can any features in this dataset help predict how a subject would rate their relationship quality?”, and “What other insights can I derive from using machine learning for exploratory analysis?” The intended audience for this project is anyone interested in romantic relationships or machine learning.

The dataset is from a Stanford University survey, “How Couples …


Predictors Of Covid-19 Vaccination Rate In Usa: A Machine Learning Approach, Syed M. I. Osman, Ahmed Sabit 2022 Sacred Heart University

Predictors Of Covid-19 Vaccination Rate In Usa: A Machine Learning Approach, Syed M. I. Osman, Ahmed Sabit

WCBT Faculty Publications

In this study, we examine state-level features and policies that are most important in achieving a threshold level vaccination rate to curve the effects of the COVID-19 pandemic. We employ CHAID, a decision tree algorithm, on three different model specifications to answer this question based on a dataset that includes all the states in the United States. Workplace travel emerges as the most important predictor; however, the governors’ political affiliation (PA) replaces it in a more conservative feature set that includes economic features and the growth rate of COVID-19 cases. We also employ several alternative algorithms as a robustness check. …


Mle And Eap Methods For Estimating Ability Scores For Data Of Varying Sample Size And Item Length, Sahar Taji 2022 University of Arkansas, Fayetteville

Mle And Eap Methods For Estimating Ability Scores For Data Of Varying Sample Size And Item Length, Sahar Taji

Graduate Theses and Dissertations

In this research, the performance of two popular estimators, Maximum Likelihood Estimator(MLE) and Bayesian Expected a Posteriori (EAP) is studied and compared in estimating the latent ability score in an Item Response Theory (IRT) model. The 2-Parameter Logistic (2PL) IRT model which is characterized by difficulty and discrimination item parameters is used to estimate the latent ability scores. Several datasets are generated for variety of sample size and item length values. The Monte-Carlo simulation is used to analyze the performance of the estimators. Results show that MLE produces reliable results with low root mean square error (RMSE) across all datasets. …


Learning From Public Spaces In Historic Cities, Cody Josh Kucharski 2022 Kennesaw State University

Learning From Public Spaces In Historic Cities, Cody Josh Kucharski

Symposium of Student Scholars

Successful public spaces in cities are key for enhancing social cohesion and improving health and safety. Learning from historic cities involves the development of representational and analytical tools aimed at capturing their essence as places of human interaction. The research reports findings of the spatial analysis of twenty Adriatic and Ionian coastal cities, which addresses the question of how the network of public spaces calibrates different degrees of spatial enclosure necessary for creating successful social interactions. Cities in the littoral region include well-preserved historic centers that are renowned for the successful integration of urban squares into the urban fabric. For …


Classification Of Breast Cancer Histopathological Images Using Semi-Supervised Gans, Balaji Avvaru, Nibhrat Lohia, Sowmya Mani, Vijayasrikanth kaniti 2022 Southern Methodist University

Classification Of Breast Cancer Histopathological Images Using Semi-Supervised Gans, Balaji Avvaru, Nibhrat Lohia, Sowmya Mani, Vijayasrikanth Kaniti

SMU Data Science Review

Breast cancer is diagnosed more frequently than skin cancer in women in the United States. Most breast cancer cases are diagnosed in women, while children and men are less likely to develop the disease. Various tissues in the breast grow uncontrollably, resulting in breast cancer. Different treatments analyze microscopic histopathology images for diagnosis that help accurately detect cancer cells. Deep learning is one of the evolving techniques to classify images where accuracy depends on the volume and quality of labeled images. This study used various pre-trained models to train the histopathological images and analyze these models to create a new …


Cov-Inception: Covid-19 Detection Tool Using Chest X-Ray, Aswini Thota, Ololade Awodipe, Rashmi Patel 2022 Southern Methodist University

Cov-Inception: Covid-19 Detection Tool Using Chest X-Ray, Aswini Thota, Ololade Awodipe, Rashmi Patel

SMU Data Science Review

Since the pandemic started, researchers have been trying to find a way to detect COVID-19 which is a cost-effective, fast, and reliable way to keep the economy viable and running. This research details how chest X-ray radiography can be utilized to detect the infection. This can be for implementation in Airports, Schools, and places of business. Currently, Chest imaging is not a first-line test for COVID-19 due to low diagnostic accuracy and confounding with other viral pneumonia. Different pre-trained algorithms were fine-tuned and applied to the images to train the model and the best model obtained was fine-tuned InceptionV3 model …


Computer Aided Diagnosis System For Breast Cancer Using Deep Learning., Asma Baccouche 2022 University of Louisville

Computer Aided Diagnosis System For Breast Cancer Using Deep Learning., Asma Baccouche

Electronic Theses and Dissertations

The recent rise of big data technology surrounding the electronic systems and developed toolkits gave birth to new promises for Artificial Intelligence (AI). With the continuous use of data-centric systems and machines in our lives, such as social media, surveys, emails, reports, etc., there is no doubt that data has gained the center of attention by scientists and motivated them to provide more decision-making and operational support systems across multiple domains. With the recent breakthroughs in artificial intelligence, the use of machine learning and deep learning models have achieved remarkable advances in computer vision, ecommerce, cybersecurity, and healthcare. Particularly, numerous …


Why, New York City? Gauging The Quality Of Life Through The Thoughts Of Tweeters, Sheryl Williams 2022 The Graduate Center, City University of New York

Why, New York City? Gauging The Quality Of Life Through The Thoughts Of Tweeters, Sheryl Williams

Dissertations, Theses, and Capstone Projects

As a resource for social data, Twitter’s platform has been used to measure the quality of life through sentiment analysis. This capstone project explores another methodological technique—querying Twitter data around specific keyword terms to determine dominant topics, word patterns, and sentiment leanings in a geographical area. Focusing on New York City and Los Angeles for comparative analysis, the keyword term “why” will be used to build a Python analysis around topic modeling and sentiment analysis. Using this approach, the analysis reveals social and cultural differences, the overall sentiment of tweets, and subjects of interest to tweeters.

GitHub Repository for all …


Optimal Time-Dependent Classification For Diagnostic Testing, Prajakta P. Bedekar, Paul Patrone, Anthony Kearsley 2022 Johns Hopkins University

Optimal Time-Dependent Classification For Diagnostic Testing, Prajakta P. Bedekar, Paul Patrone, Anthony Kearsley

Biology and Medicine Through Mathematics Conference

No abstract provided.


Attempting To Predict The Unpredictable: March Madness, Coleton Kanzmeier 2022 University of Nebraska at Omaha

Attempting To Predict The Unpredictable: March Madness, Coleton Kanzmeier

Theses/Capstones/Creative Projects

Each year, millions upon millions of individuals fill out at least one if not hundreds of March Madness brackets. People test their luck every year, whether for fun, with friends or family, or to even win some money. Some people rely on their basketball knowledge whereas others know it is called March Madness for a reason and take a shot in the dark. Others have even tried using statistics to give them an edge. I intend to follow a similar approach, using statistics to my advantage. The end goal is to predict this year’s, 2022, March Madness bracket. To achieve …


Posterior Predictive Model Checking Of The Hierarchical Rater Model, Nnamdi Chika Ezike 2022 University of Arkansas, Fayetteville

Posterior Predictive Model Checking Of The Hierarchical Rater Model, Nnamdi Chika Ezike

Graduate Theses and Dissertations

Fitting wrongly specified models to observed data may lead to invalid inferences about the model parameters of interest. The current study investigated the performance of the posterior predictive model checking (PPMC) approach in detecting model-data misfit of the hierarchical rater model (HRM). The HRM is a rater-mediated model that incorporates components of the polytomous item response theory (IRT) model, such as the partial credit model (PCM) and generalized partial credit model (GPCM), at the second level of the hierarchy, to model examinees’ responses to performance assessments. To date, the HRM has not been rigorously evaluated using PPMC techniques. Monte Carlo …


Applying Data Analytics As An Alternative To Subjective Rankings Of Players In Fantasy Basketball, Christopher Collins 2022 United States Military Academy

Applying Data Analytics As An Alternative To Subjective Rankings Of Players In Fantasy Basketball, Christopher Collins

Mathematica Militaris

This paper demonstrates the ranking of players for fantasy basketball using one of the platforms of Multi Criteria Decision Making (MCDM), the Technique for Order Preference by Similarity to an Ideal Solution (TOPSIS) method. Specially, it compares results of TOPSIS generated fantasy rankings from the 2016-2017 NBA Season against industry fantasy experts’ 2017-2018 NBA pre-season rankings. Fantasy experts combine various techniques to create their rankings. Frequently blending quantitative and qualitative factors in order to project bottom-up rankings, they incongruently mix subjective and objective criterion. Conversely, TOPSIS is a mathematical way of doing literally what its name describes, ranking by a …


Impact Of Treatment Length On Individuals With Substance Use Disorders In Allegheny County, Cassie DiBenedetti, Kate Rosello 2022 Duquesne University

Impact Of Treatment Length On Individuals With Substance Use Disorders In Allegheny County, Cassie Dibenedetti, Kate Rosello

Undergraduate Research and Scholarship Symposium

Auberle social services is opening the Family Healing Center (FHC), a level 3.5 treatment program in Pittsburgh, PA that provides housing and 24-hour support for families struggling with opioid addiction. We partnered with Auberle to study characteristics of individuals receiving level 3.5 treatment and to determine whether longer treatment lengths correlate with fewer adverse outcomes. We obtained data from the Allegheny County Department of Human Services on 2,016 individuals admitted to level 3.5 treatment in 2019. The data included birth year, race, gender, admittance date, discharge date, and Children Youth and Family (CYF) incidents before and after treatment. We categorized …


Machine Learning In Support Of Student Success, Rachel Rucker 2022 Stephen F Austin State University

Machine Learning In Support Of Student Success, Rachel Rucker

Undergraduate Research Conference

Our goal is to predict whether a student will finish the semester on academic probation by mid-term using university data.


Power Properties Of Ordinal Regression Models For Likert Type Data, Ulf Olsson 2022 Swedish University of Agricultural Sciences

Power Properties Of Ordinal Regression Models For Likert Type Data, Ulf Olsson

Practical Assessment, Research, and Evaluation

We discuss analysis of 5-grade Likert type data in the two-sample case. Analysis using two-sample t tests, nonparametric Wilcoxon tests, and ordinal regression methods, are compared using simulated data based on an ordinal regression paradigm. One thousand pairs of samples of size n=10 and n=30 were generated, with three different degrees of skewness. For all sample sizes and degrees of skewness, the ordinal probit model has highest power. This is not surprising since the data was generated with this model in mind. Slightly more surprising is that the t test has higher power than the Wilcoxon test in …


Split Classification Model For Complex Clustered Data, Katherine Gerot 2022 University of Nebraska - Lincoln

Split Classification Model For Complex Clustered Data, Katherine Gerot

Honors Theses, University of Nebraska-Lincoln

Classification in high-dimensional data has generated tremendous interest in a multitude of fields. Data in higher dimensions often tend to reside in non-Euclidean metric space. This prevents Euclidean-based classification methodologies, such as regression, from reliably modeling the data. Many proposed models rely on computationally-complex embedding to convert the data to a more usable format. Others, namely the Support Vector Machine, rely on kernel manipulation to implicitly describe the "feature space" to arrive at a non-linear decision boundary. The proposed methodology in this paper seeks to classify complex data in a relatively computationally-simple and explainable manner.


Session 5: Equipment Finance Credit Risk Modeling - A Case Study In Creative Model Development & Nimble Data Engineering, Edward Krueger, Landon Thompson, Josh Moore 2022 Channel Partners

Session 5: Equipment Finance Credit Risk Modeling - A Case Study In Creative Model Development & Nimble Data Engineering, Edward Krueger, Landon Thompson, Josh Moore

SDSU Data Science Symposium

This presentation will focus first on providing an overview of Channel and the Risk Analytics team that performed this case study. Given that context, we’ll then dive into our approach for building the modeling development data set, techniques and tools used to develop and implement the model into a production environment, and some of the challenges faced upon launch. Then, the presentation will pivot to the data engineering pipeline. During this portion, we will explore the application process and what happens to the data we collect. This will include how we extract & store the data along with how it …


Digital Commons powered by bepress