Open Access. Powered by Scholars. Published by Universities.®

Categorical Data Analysis Commons

Open Access. Powered by Scholars. Published by Universities.®

393 Full-Text Articles 583 Authors 198,642 Downloads 91 Institutions

All Articles in Categorical Data Analysis

Faceted Search

393 full-text articles. Page 1 of 15.

Optimal Time-Dependent Classification For Diagnostic Testing, Prajakta P. Bedekar, Paul Patrone, Anthony Kearsley 2022 Johns Hopkins University

Optimal Time-Dependent Classification For Diagnostic Testing, Prajakta P. Bedekar, Paul Patrone, Anthony Kearsley

Biology and Medicine Through Mathematics Conference

No abstract provided.


Impact Of Treatment Length On Individuals With Substance Use Disorders In Allegheny County, Cassie DiBenedetti, Kate Rosello 2022 Duquesne University

Impact Of Treatment Length On Individuals With Substance Use Disorders In Allegheny County, Cassie Dibenedetti, Kate Rosello

Undergraduate Research and Scholarship Symposium

Auberle social services is opening the Family Healing Center (FHC), a level 3.5 treatment program in Pittsburgh, PA that provides housing and 24-hour support for families struggling with opioid addiction. We partnered with Auberle to study characteristics of individuals receiving level 3.5 treatment and to determine whether longer treatment lengths correlate with fewer adverse outcomes. We obtained data from the Allegheny County Department of Human Services on 2,016 individuals admitted to level 3.5 treatment in 2019. The data included birth year, race, gender, admittance date, discharge date, and Children Youth and Family (CYF) incidents before and ...


Attempting To Predict The Unpredictable: March Madness, Coleton Kanzmeier 2022 University of Nebraska at Omaha

Attempting To Predict The Unpredictable: March Madness, Coleton Kanzmeier

Theses/Capstones/Creative Projects

Each year, millions upon millions of individuals fill out at least one if not hundreds of March Madness brackets. People test their luck every year, whether for fun, with friends or family, or to even win some money. Some people rely on their basketball knowledge whereas others know it is called March Madness for a reason and take a shot in the dark. Others have even tried using statistics to give them an edge. I intend to follow a similar approach, using statistics to my advantage. The end goal is to predict this year’s, 2022, March Madness bracket. To ...


Applying Data Analytics As An Alternative To Subjective Rankings Of Players In Fantasy Basketball, Christopher Collins 2022 United States Military Academy

Applying Data Analytics As An Alternative To Subjective Rankings Of Players In Fantasy Basketball, Christopher Collins

Mathematica Militaris

This paper demonstrates the ranking of players for fantasy basketball using one of the platforms of Multi Criteria Decision Making (MCDM), the Technique for Order Preference by Similarity to an Ideal Solution (TOPSIS) method. Specially, it compares results of TOPSIS generated fantasy rankings from the 2016-2017 NBA Season against industry fantasy experts’ 2017-2018 NBA pre-season rankings. Fantasy experts combine various techniques to create their rankings. Frequently blending quantitative and qualitative factors in order to project bottom-up rankings, they incongruently mix subjective and objective criterion. Conversely, TOPSIS is a mathematical way of doing literally what its name describes, ranking by a ...


Machine Learning In Support Of Student Success, Rachel Rucker 2022 Stephen F Austin State University

Machine Learning In Support Of Student Success, Rachel Rucker

Undergraduate Research Conference

Our goal is to predict whether a student will finish the semester on academic probation by mid-term using university data.


Power Properties Of Ordinal Regression Models For Likert Type Data, Ulf Olsson 2022 Swedish University of Agricultural Sciences

Power Properties Of Ordinal Regression Models For Likert Type Data, Ulf Olsson

Practical Assessment, Research, and Evaluation

We discuss analysis of 5-grade Likert type data in the two-sample case. Analysis using two-sample t tests, nonparametric Wilcoxon tests, and ordinal regression methods, are compared using simulated data based on an ordinal regression paradigm. One thousand pairs of samples of size n=10 and n=30 were generated, with three different degrees of skewness. For all sample sizes and degrees of skewness, the ordinal probit model has highest power. This is not surprising since the data was generated with this model in mind. Slightly more surprising is that the t test has higher power than the Wilcoxon test in ...


Split Classification Model For Complex Clustered Data, Katherine Gerot 2022 University of Nebraska - Lincoln

Split Classification Model For Complex Clustered Data, Katherine Gerot

Honors Theses, University of Nebraska-Lincoln

Classification in high-dimensional data has generated tremendous interest in a multitude of fields. Data in higher dimensions often tend to reside in non-Euclidean metric space. This prevents Euclidean-based classification methodologies, such as regression, from reliably modeling the data. Many proposed models rely on computationally-complex embedding to convert the data to a more usable format. Others, namely the Support Vector Machine, rely on kernel manipulation to implicitly describe the "feature space" to arrive at a non-linear decision boundary. The proposed methodology in this paper seeks to classify complex data in a relatively computationally-simple and explainable manner.


Session 5: Equipment Finance Credit Risk Modeling - A Case Study In Creative Model Development & Nimble Data Engineering, Edward Krueger, Landon Thompson, Josh Moore 2022 Channel Partners

Session 5: Equipment Finance Credit Risk Modeling - A Case Study In Creative Model Development & Nimble Data Engineering, Edward Krueger, Landon Thompson, Josh Moore

SDSU Data Science Symposium

This presentation will focus first on providing an overview of Channel and the Risk Analytics team that performed this case study. Given that context, we’ll then dive into our approach for building the modeling development data set, techniques and tools used to develop and implement the model into a production environment, and some of the challenges faced upon launch. Then, the presentation will pivot to the data engineering pipeline. During this portion, we will explore the application process and what happens to the data we collect. This will include how we extract & store the data along with how it ...


Slices Of The Big Apple: A Visual Explanation And Analysis Of The New York City Budget, Joanne Ramadani 2022 The Graduate Center, City University of New York

Slices Of The Big Apple: A Visual Explanation And Analysis Of The New York City Budget, Joanne Ramadani

Dissertations, Theses, and Capstone Projects

As a component of government, budgets are fundamental not only to improving the quality of a shared society, but also to understanding what our government officials consider to be their priorities. However, most budgets can be difficult to understand, using terms that are not familiar to people who have not studied finance or economics. To that end, Slices of the Big Apple is an interactive, centralized narrative website that uses visualizations at its core in order to: 1) facilitate a holistic understanding of the New York City government budget for NYC residents; and 2) conduct a five-year analysis of Community ...


Smoking, Alcohol Consumption, And Depression In Association With Incidence Of Type 2 Diabetes Among Mexican Americans In Starr County, Texas, Gabriela Rubannelsonkumar 2021 St. Mary's University

Smoking, Alcohol Consumption, And Depression In Association With Incidence Of Type 2 Diabetes Among Mexican Americans In Starr County, Texas, Gabriela Rubannelsonkumar

St. Mary's University Honors Theses and Projects

Previous studies on conditions like obesity, hypertension, and type 2 diabetes mellitus (T2DM) have explored the correlations between them and various other human conditions, including aortic stiffness, left ventricular hypertrophy and sleep apnea, as they predict possibilities of developing certain diseases in Mexican Americans. This study aims to observe the correlation between lifestyle decisions that could relate to the onset of the depression in normal, prediabetic, and diabetic individuals. These include smoking habits and alcohol consumption. Many papers have previously conducted research on these lifestyle habits as they relate to obesity, hypertension, diabetes, however, have done so in a singular ...


Aspect-Based Sentiment Analysis Of Movie Reviews, Samuel Onalaja, Eric Romero, Bosang Yun 2021 Southern Methodist University

Aspect-Based Sentiment Analysis Of Movie Reviews, Samuel Onalaja, Eric Romero, Bosang Yun

SMU Data Science Review

This study investigates a comparison of classification models used to determine aspect based separated text sentiment and predict binary sentiments of movie reviews with genre and aspect specific driving factors. To gain a broader classification analysis, five machine and deep learning algorithms were compared: Logistic Regression (LR), Naive Bayes (NB), Support Vector Machine (SVM), and Recurrent Neural Network Long-Short-Term Memory (RNN LSTM). The various movie aspects that are utilized to separate the sentences are determined through aggregating aspect words from lexicon-base, supervised and unsupervised learning. The driving factors are randomly assigned to various movie aspects and their impact tied to ...


Identification And Characterization Of Forest Fire Risk Zones Leveraging Machine Learning Methods, Joshua Balson, Matt Chinchilla, Cam Lu, Jeff Washburn, Nibhrat Lohia 2021 Southern Methodist University

Identification And Characterization Of Forest Fire Risk Zones Leveraging Machine Learning Methods, Joshua Balson, Matt Chinchilla, Cam Lu, Jeff Washburn, Nibhrat Lohia

SMU Data Science Review

Across the United States, record numbers of wildfires are observed costing billions of dollars in property damage, polluting the environment, and putting lives at risk. The ability of emergency management professionals, city planners, and private entities such as insurance companies to determine if an area is at higher risk of a fire breaking out has never been greater. This paper proposes a novel methodology for identifying and characterizing zones with increased risks of forest fires. Methods involving machine learning techniques use the widely available and recorded data, thus making it possible to implement the tool quickly.


Integration Of Blockchain Technology Into Automobiles To Prevent And Study The Causes Of Accidents, John Kim 2021 California State University, San Bernardino

Integration Of Blockchain Technology Into Automobiles To Prevent And Study The Causes Of Accidents, John Kim

Electronic Theses, Projects, and Dissertations

Automobile collisions occur daily. We now live in an information-driven world, one where technology is quickly evolving. Blockchain technology can change the automotive industry, the safety of the motoring public and its surrounding environment by incorporating this vast array of information. It can place safety and efficiency at the forefront to pedestrians, public establishments, and provide public agencies with pertinent information securely and efficiently. Other industries where Blockchain technology has been effective in are as follows: supply chain management, logistics, and banking. This paper reviews some statistical information regarding automobile collisions, Blockchain technology, Smart Contracts, Smart Cities; assesses the feasibility ...


Data Consultations, Racism, And Critiquing Colonialism In Demographic Datasheets, Nina Exner, Erin Carrillo, Sam A. Leif 2021 Virginia Commonwealth University

Data Consultations, Racism, And Critiquing Colonialism In Demographic Datasheets, Nina Exner, Erin Carrillo, Sam A. Leif

Journal of eScience Librarianship

Objective: We consider how data librarians can take antiracist action in education and consultations. We attempt to apply QuantCrit thinking, particularly to demographic datasheets.

Methods: We synthesize historical context with modern critical thinking about race and data to examine the origins of current assumptions about data. We then present examples of how racial categories can hide, rather than reveal, racial disparities. Finally, we apply the Model of Domain Learning to explain why data science and data management experts can and should expose experts in subject research to the idea of critically examining demographic data collection.

Results: There are good reasons ...


Why Does An Ex-Offender Reoffend?, Jacob Rybak 2021 Kennesaw State University

Why Does An Ex-Offender Reoffend?, Jacob Rybak

Symposium of Student Scholars

What leads to an offender to go back to prison? Iowa has collected data tracking recidivism to evaluate the effectiveness of its programs for released offenders. This data set includes the following for all of the offenders: age groups, type of release (parole vs being discharged at the end of their sentence), race, sex, year of release, supervising district, original offense, and whether they recidivated. For the offenders who return to prison, the data set includes measures on days to return, type of recidivism (technicality or new crime), and what the specific offense was that caused their return.

In the ...


Model-Free Descriptive Modeling For Multivariate Categorical Data With An Ordinal Dependent Variable, Li Wang 2021 University of Massachusetts Amherst

Model-Free Descriptive Modeling For Multivariate Categorical Data With An Ordinal Dependent Variable, Li Wang

Doctoral Dissertations

In the process of statistical modeling, the descriptive modeling plays an essential role in accelerating the formulation of plausible hypotheses in the subsequent explanatory modeling and facilitating the selection of potential variables in the subsequent predictive modeling. Especially, for multivariate categorical data analysis, it is desirable to use the descriptive modeling methods for uncovering and summarizing the potential association structure among multiple categorical variables in a compact manner. However, many classical methods in this case either rely on strong assumptions for parametric models or become infeasible when the data dimension is higher. To this end, we propose a model-free method ...


Statistical Modeling For High-Dimensional Compositional Data With Applications To The Human Microbiome, Thy Dao 2021 University of Arkansas, Fayetteville

Statistical Modeling For High-Dimensional Compositional Data With Applications To The Human Microbiome, Thy Dao

Graduate Theses and Dissertations

Compositional data refer to the data that lie on a simplex, which are common in many scientific domains such as genomics, geology, and economics. As the components in a composition must sum to one, traditional tests based on unconstrained data become inappropriate, and new statistical methods are needed to analyze this special type of data. This dissertation is motivated by some statistical problems arising in the analysis of compositional data. In particular, we focus on the high-dimensional and over-dispersed setting, where the dimensionality of compositions is greater than the sample size and the dispersion parameter is moderate or large. In ...


Privacy-Preserving Cloud-Assisted Data Analytics, Wei Bao 2021 University of Arkansas, Fayetteville

Privacy-Preserving Cloud-Assisted Data Analytics, Wei Bao

Graduate Theses and Dissertations

Nowadays industries are collecting a massive and exponentially growing amount of data that can be utilized to extract useful insights for improving various aspects of our life. Data analytics (e.g., via the use of machine learning) has been extensively applied to make important decisions in various real world applications. However, it is challenging for resource-limited clients to analyze their data in an efficient way when its scale is large. Additionally, the data resources are increasingly distributed among different owners. Nonetheless, users' data may contain private information that needs to be protected.

Cloud computing has become more and more popular ...


Knowledge Discovery From Complex Event Time Data With Covariates, Samira Karimi 2021 University of Arkansas, Fayetteville

Knowledge Discovery From Complex Event Time Data With Covariates, Samira Karimi

Graduate Theses and Dissertations

In particular engineering applications, such as reliability engineering, complex types of data are encountered which require novel methods of statistical analysis. Handling covariates properly while managing the missing values is a challenging task. These type of issues happen frequently in reliability data analysis. Specifically, accelerated life testing (ALT) data are usually conducted by exposing test units of a product to severer-than-normal conditions to expedite the failure process. The resulting lifetime and/or censoring data are often modeled by a probability distribution along with a life-stress relationship. However, if the probability distribution and life-stress relationship selected cannot adequately describe the underlying ...


Grizzly Bears Mortalities And The Survival Of The Species, Courtney Swanson 2021 University of Minnesota - Morris

Grizzly Bears Mortalities And The Survival Of The Species, Courtney Swanson

Senior Seminars and Capstones

In this paper we aim to understand what is happening in the grizzly bear population mortalities from the year 2010 to 2020. We are performing Classical and Regression Tree (CART) methods and Correspondence Analysis on data provided by the U.S. Geological Survey (USGS). We found certain variables in the data set to be important through CART methods. Correspondence Analysis then allowed us to compare these variables to determine their relationships and association to one another. Most of the grizzly bear deaths are human caused and mainly over land and resources such as food and habitat. This aligns with some ...


Digital Commons powered by bepress