Open Access. Powered by Scholars. Published by Universities.®

Categorical Data Analysis Commons

Open Access. Powered by Scholars. Published by Universities.®

399 Full-Text Articles 591 Authors 206,827 Downloads 91 Institutions

All Articles in Categorical Data Analysis

Faceted Search

399 full-text articles. Page 1 of 16.

Why, New York City? Gauging The Quality Of Life Through The Thoughts Of Tweeters, Sheryl Williams 2022 The Graduate Center, City University of New York

Why, New York City? Gauging The Quality Of Life Through The Thoughts Of Tweeters, Sheryl Williams

Dissertations, Theses, and Capstone Projects

As a resource for social data, Twitter’s platform has been used to measure the quality of life through sentiment analysis. This capstone project explores another methodological technique—querying Twitter data around specific keyword terms to determine dominant topics, word patterns, and sentiment leanings in a geographical area. Focusing on New York City and Los Angeles for comparative analysis, the keyword term “why” will be used to build a Python analysis around topic modeling and sentiment analysis. Using this approach, the analysis reveals social and cultural differences, the overall sentiment of tweets, and subjects of interest to tweeters.

GitHub Repository ...


Optimal Time-Dependent Classification For Diagnostic Testing, Prajakta P. Bedekar, Paul Patrone, Anthony Kearsley 2022 Johns Hopkins University

Optimal Time-Dependent Classification For Diagnostic Testing, Prajakta P. Bedekar, Paul Patrone, Anthony Kearsley

Biology and Medicine Through Mathematics Conference

No abstract provided.


Attempting To Predict The Unpredictable: March Madness, Coleton Kanzmeier 2022 University of Nebraska at Omaha

Attempting To Predict The Unpredictable: March Madness, Coleton Kanzmeier

Theses/Capstones/Creative Projects

Each year, millions upon millions of individuals fill out at least one if not hundreds of March Madness brackets. People test their luck every year, whether for fun, with friends or family, or to even win some money. Some people rely on their basketball knowledge whereas others know it is called March Madness for a reason and take a shot in the dark. Others have even tried using statistics to give them an edge. I intend to follow a similar approach, using statistics to my advantage. The end goal is to predict this year’s, 2022, March Madness bracket. To ...


Posterior Predictive Model Checking Of The Hierarchical Rater Model, Nnamdi Chika Ezike 2022 University of Arkansas, Fayetteville

Posterior Predictive Model Checking Of The Hierarchical Rater Model, Nnamdi Chika Ezike

Graduate Theses and Dissertations

Fitting wrongly specified models to observed data may lead to invalid inferences about the model parameters of interest. The current study investigated the performance of the posterior predictive model checking (PPMC) approach in detecting model-data misfit of the hierarchical rater model (HRM). The HRM is a rater-mediated model that incorporates components of the polytomous item response theory (IRT) model, such as the partial credit model (PCM) and generalized partial credit model (GPCM), at the second level of the hierarchy, to model examinees’ responses to performance assessments. To date, the HRM has not been rigorously evaluated using PPMC techniques. Monte Carlo ...


Applying Data Analytics As An Alternative To Subjective Rankings Of Players In Fantasy Basketball, Christopher Collins 2022 United States Military Academy

Applying Data Analytics As An Alternative To Subjective Rankings Of Players In Fantasy Basketball, Christopher Collins

Mathematica Militaris

This paper demonstrates the ranking of players for fantasy basketball using one of the platforms of Multi Criteria Decision Making (MCDM), the Technique for Order Preference by Similarity to an Ideal Solution (TOPSIS) method. Specially, it compares results of TOPSIS generated fantasy rankings from the 2016-2017 NBA Season against industry fantasy experts’ 2017-2018 NBA pre-season rankings. Fantasy experts combine various techniques to create their rankings. Frequently blending quantitative and qualitative factors in order to project bottom-up rankings, they incongruently mix subjective and objective criterion. Conversely, TOPSIS is a mathematical way of doing literally what its name describes, ranking by a ...


Impact Of Treatment Length On Individuals With Substance Use Disorders In Allegheny County, Cassie DiBenedetti, Kate Rosello 2022 Duquesne University

Impact Of Treatment Length On Individuals With Substance Use Disorders In Allegheny County, Cassie Dibenedetti, Kate Rosello

Undergraduate Research and Scholarship Symposium

Auberle social services is opening the Family Healing Center (FHC), a level 3.5 treatment program in Pittsburgh, PA that provides housing and 24-hour support for families struggling with opioid addiction. We partnered with Auberle to study characteristics of individuals receiving level 3.5 treatment and to determine whether longer treatment lengths correlate with fewer adverse outcomes. We obtained data from the Allegheny County Department of Human Services on 2,016 individuals admitted to level 3.5 treatment in 2019. The data included birth year, race, gender, admittance date, discharge date, and Children Youth and Family (CYF) incidents before and ...


Machine Learning In Support Of Student Success, Rachel Rucker 2022 Stephen F Austin State University

Machine Learning In Support Of Student Success, Rachel Rucker

Undergraduate Research Conference

Our goal is to predict whether a student will finish the semester on academic probation by mid-term using university data.


Power Properties Of Ordinal Regression Models For Likert Type Data, Ulf Olsson 2022 Swedish University of Agricultural Sciences

Power Properties Of Ordinal Regression Models For Likert Type Data, Ulf Olsson

Practical Assessment, Research, and Evaluation

We discuss analysis of 5-grade Likert type data in the two-sample case. Analysis using two-sample t tests, nonparametric Wilcoxon tests, and ordinal regression methods, are compared using simulated data based on an ordinal regression paradigm. One thousand pairs of samples of size n=10 and n=30 were generated, with three different degrees of skewness. For all sample sizes and degrees of skewness, the ordinal probit model has highest power. This is not surprising since the data was generated with this model in mind. Slightly more surprising is that the t test has higher power than the Wilcoxon test in ...


Split Classification Model For Complex Clustered Data, Katherine Gerot 2022 University of Nebraska - Lincoln

Split Classification Model For Complex Clustered Data, Katherine Gerot

Honors Theses, University of Nebraska-Lincoln

Classification in high-dimensional data has generated tremendous interest in a multitude of fields. Data in higher dimensions often tend to reside in non-Euclidean metric space. This prevents Euclidean-based classification methodologies, such as regression, from reliably modeling the data. Many proposed models rely on computationally-complex embedding to convert the data to a more usable format. Others, namely the Support Vector Machine, rely on kernel manipulation to implicitly describe the "feature space" to arrive at a non-linear decision boundary. The proposed methodology in this paper seeks to classify complex data in a relatively computationally-simple and explainable manner.


Session 5: Equipment Finance Credit Risk Modeling - A Case Study In Creative Model Development & Nimble Data Engineering, Edward Krueger, Landon Thompson, Josh Moore 2022 Channel Partners

Session 5: Equipment Finance Credit Risk Modeling - A Case Study In Creative Model Development & Nimble Data Engineering, Edward Krueger, Landon Thompson, Josh Moore

SDSU Data Science Symposium

This presentation will focus first on providing an overview of Channel and the Risk Analytics team that performed this case study. Given that context, we’ll then dive into our approach for building the modeling development data set, techniques and tools used to develop and implement the model into a production environment, and some of the challenges faced upon launch. Then, the presentation will pivot to the data engineering pipeline. During this portion, we will explore the application process and what happens to the data we collect. This will include how we extract & store the data along with how it ...


Slices Of The Big Apple: A Visual Explanation And Analysis Of The New York City Budget, Joanne Ramadani 2022 The Graduate Center, City University of New York

Slices Of The Big Apple: A Visual Explanation And Analysis Of The New York City Budget, Joanne Ramadani

Dissertations, Theses, and Capstone Projects

As a component of government, budgets are fundamental not only to improving the quality of a shared society, but also to understanding what our government officials consider to be their priorities. However, most budgets can be difficult to understand, using terms that are not familiar to people who have not studied finance or economics. To that end, Slices of the Big Apple is an interactive, centralized narrative website that uses visualizations at its core in order to: 1) facilitate a holistic understanding of the New York City government budget for NYC residents; and 2) conduct a five-year analysis of Community ...


The Data Analytics And The Science Revolution, Leila Halawi, Amal Clarke, Kelly George 2022 Embry-Riddle Aeronautical University

The Data Analytics And The Science Revolution, Leila Halawi, Amal Clarke, Kelly George

Publications

This text highlights the difference between analytics and data science, using predictive analytic techniques to analyze different historical data, including aviation data and concrete data, interpreting the predictive models, and highlighting the steps to deploy the models and the steps ahead. The book combines the conceptual perspective and a hands-on approach to predictive analytics using SAS VIYA, an analytic and data management platform. The authors use SAS VIYA to focus on analytics to solve problems, highlight how analytics is applied in the airline and business environment, and compare several different modeling techniques. They decipher complex algorithms to demonstrate how they ...


Graph Neural Networks For Improved Interpretability And Efficiency, Patrick Pho 2022 University of Central Florida

Graph Neural Networks For Improved Interpretability And Efficiency, Patrick Pho

Electronic Theses and Dissertations, 2020-

Attributed graph is a powerful tool to model real-life systems which exist in many domains such as social science, biology, e-commerce, etc. The behaviors of those systems are mostly defined by or dependent on their corresponding network structures. Graph analysis has become an important line of research due to the rapid integration of such systems into every aspect of human life and the profound impact they have on human behaviors. Graph structured data contains a rich amount of information from the network connectivity and the supplementary input features of nodes. Machine learning algorithms or traditional network science tools have limitation ...


A Monte Carlo Simulation Of Rat Choice Behavior With Interdependent Outcomes, Michelle A. Frankot 2022 West Virginia University

A Monte Carlo Simulation Of Rat Choice Behavior With Interdependent Outcomes, Michelle A. Frankot

Graduate Theses, Dissertations, and Problem Reports

Preclinical behavioral neuroscience often uses choice paradigms to capture psychiatric symptoms. In particular, the subfield of operant research produces nested datasets with many discrete choices in a session. The standard analytic practice is to aggregate choice into a continuous variable and analyze using ANOVA or linear regression. However, choice data often have multiple interdependent outcomes of interest, violating an assumption of general linear models. The aim of the current study was to quantify the accuracy of linear mixed-effects regression (LMER) for analyzing data from a 4-choice operant task called the Rodent Gambling Task (RGT), which measures decision-making in the context ...


Smoking, Alcohol Consumption, And Depression In Association With Incidence Of Type 2 Diabetes Among Mexican Americans In Starr County, Texas, Gabriela Rubannelsonkumar 2021 St. Mary's University

Smoking, Alcohol Consumption, And Depression In Association With Incidence Of Type 2 Diabetes Among Mexican Americans In Starr County, Texas, Gabriela Rubannelsonkumar

St. Mary's University Honors Theses and Projects

Previous studies on conditions like obesity, hypertension, and type 2 diabetes mellitus (T2DM) have explored the correlations between them and various other human conditions, including aortic stiffness, left ventricular hypertrophy and sleep apnea, as they predict possibilities of developing certain diseases in Mexican Americans. This study aims to observe the correlation between lifestyle decisions that could relate to the onset of the depression in normal, prediabetic, and diabetic individuals. These include smoking habits and alcohol consumption. Many papers have previously conducted research on these lifestyle habits as they relate to obesity, hypertension, diabetes, however, have done so in a singular ...


Aspect-Based Sentiment Analysis Of Movie Reviews, Samuel Onalaja, Eric Romero, Bosang Yun 2021 Southern Methodist University

Aspect-Based Sentiment Analysis Of Movie Reviews, Samuel Onalaja, Eric Romero, Bosang Yun

SMU Data Science Review

This study investigates a comparison of classification models used to determine aspect based separated text sentiment and predict binary sentiments of movie reviews with genre and aspect specific driving factors. To gain a broader classification analysis, five machine and deep learning algorithms were compared: Logistic Regression (LR), Naive Bayes (NB), Support Vector Machine (SVM), and Recurrent Neural Network Long-Short-Term Memory (RNN LSTM). The various movie aspects that are utilized to separate the sentences are determined through aggregating aspect words from lexicon-base, supervised and unsupervised learning. The driving factors are randomly assigned to various movie aspects and their impact tied to ...


Identification And Characterization Of Forest Fire Risk Zones Leveraging Machine Learning Methods, Joshua Balson, Matt Chinchilla, Cam Lu, Jeff Washburn, Nibhrat Lohia 2021 Southern Methodist University

Identification And Characterization Of Forest Fire Risk Zones Leveraging Machine Learning Methods, Joshua Balson, Matt Chinchilla, Cam Lu, Jeff Washburn, Nibhrat Lohia

SMU Data Science Review

Across the United States, record numbers of wildfires are observed costing billions of dollars in property damage, polluting the environment, and putting lives at risk. The ability of emergency management professionals, city planners, and private entities such as insurance companies to determine if an area is at higher risk of a fire breaking out has never been greater. This paper proposes a novel methodology for identifying and characterizing zones with increased risks of forest fires. Methods involving machine learning techniques use the widely available and recorded data, thus making it possible to implement the tool quickly.


Integration Of Blockchain Technology Into Automobiles To Prevent And Study The Causes Of Accidents, John Kim 2021 California State University, San Bernardino

Integration Of Blockchain Technology Into Automobiles To Prevent And Study The Causes Of Accidents, John Kim

Electronic Theses, Projects, and Dissertations

Automobile collisions occur daily. We now live in an information-driven world, one where technology is quickly evolving. Blockchain technology can change the automotive industry, the safety of the motoring public and its surrounding environment by incorporating this vast array of information. It can place safety and efficiency at the forefront to pedestrians, public establishments, and provide public agencies with pertinent information securely and efficiently. Other industries where Blockchain technology has been effective in are as follows: supply chain management, logistics, and banking. This paper reviews some statistical information regarding automobile collisions, Blockchain technology, Smart Contracts, Smart Cities; assesses the feasibility ...


Data Consultations, Racism, And Critiquing Colonialism In Demographic Datasheets, Nina Exner, Erin Carrillo, Sam A. Leif 2021 Virginia Commonwealth University

Data Consultations, Racism, And Critiquing Colonialism In Demographic Datasheets, Nina Exner, Erin Carrillo, Sam A. Leif

Journal of eScience Librarianship

Objective: We consider how data librarians can take antiracist action in education and consultations. We attempt to apply QuantCrit thinking, particularly to demographic datasheets.

Methods: We synthesize historical context with modern critical thinking about race and data to examine the origins of current assumptions about data. We then present examples of how racial categories can hide, rather than reveal, racial disparities. Finally, we apply the Model of Domain Learning to explain why data science and data management experts can and should expose experts in subject research to the idea of critically examining demographic data collection.

Results: There are good reasons ...


Why Does An Ex-Offender Reoffend?, Jacob Rybak 2021 Kennesaw State University

Why Does An Ex-Offender Reoffend?, Jacob Rybak

Symposium of Student Scholars

What leads to an offender to go back to prison? Iowa has collected data tracking recidivism to evaluate the effectiveness of its programs for released offenders. This data set includes the following for all of the offenders: age groups, type of release (parole vs being discharged at the end of their sentence), race, sex, year of release, supervising district, original offense, and whether they recidivated. For the offenders who return to prison, the data set includes measures on days to return, type of recidivism (technicality or new crime), and what the specific offense was that caused their return.

In the ...


Digital Commons powered by bepress