Open Access. Powered by Scholars. Published by Universities.®

Categorical Data Analysis Commons

Open Access. Powered by Scholars. Published by Universities.®

457 Full-Text Articles 681 Authors 269,410 Downloads 101 Institutions

All Articles in Categorical Data Analysis

Faceted Search

457 full-text articles. Page 1 of 19.

Innovation Challenges In The Air Force Sbir Program: From The Small Businesses' Perspective, Hart J. Holt, Amy M. Cox, Scott Drylie, David S. Long, Alfred E. Thal Jr., Robert D. Fass 2024 Air Force Security and Cooperation

Innovation Challenges In The Air Force Sbir Program: From The Small Businesses' Perspective, Hart J. Holt, Amy M. Cox, Scott Drylie, David S. Long, Alfred E. Thal Jr., Robert D. Fass

Faculty Publications

Every year the United States invests $3.2 billion in the Small Business Innovation Research (SBIR) program to promote innovation among the nation’s small businesses. Half of this investment is from the DoD. This research considers the challenges faced by small businesses innovating with the DoD, particularly those awarded SBIR contracts with the United States Air Force. The authors surveyed 286 unique small businesses that were previously awarded an Air Force SBIR contract. By asking the survey respondents open-ended questions and categorizing their responses, they pinpoint unaddressed challenges from the small business perspective. By categorizing survey responses through Qualitative Content Analysis, …


An Application Of An In-Depth Advanced Statistical Analysis In Exploring The Dynamics Of Depression, Sleep Deprivation, And Self-Esteem, Muslihat Gaffari 2024 East Tennessee State University

An Application Of An In-Depth Advanced Statistical Analysis In Exploring The Dynamics Of Depression, Sleep Deprivation, And Self-Esteem, Muslihat Gaffari

Electronic Theses and Dissertations

Depression, intertwined with sleep deprivation and self-esteem, presents a significant challenge to mental health worldwide. The research shown in this paper employs advanced statistical methodologies to unravel the complex interactions among these factors. Through log-linear homogeneous association, multinomial logistic regression, and generalized linear models, the study scrutinizes large datasets to uncover nuanced patterns and relationships. By elucidating how depression, sleep disturbances, and self-esteem intersect, the research aims to deepen understanding of mental health phenomena. The study clarifies the relationship between these variables and explores reasons for prioritizing depression research. It evaluates how statistical models, such as log-linear, multinomial logistic regression, …


Assessing Gtfs Accuracy, Gregory L. Newmark 2024 Mineta Transporation Institute

Assessing Gtfs Accuracy, Gregory L. Newmark

Mineta Transportation Institute

The promised benefits of the General Transit Feed Specification (GTFS) Schedule and Realtime standards are dependent on the underlying quality of the data. Despite this fundamental reliance, there has been relatively little research on techniques and strategies to assess GTFS accuracy. The need for such assessment is growing as federal and state governments increasingly require transit agencies to make these data available to the public. This research fills this gap by presenting a suite of methods and metrics to assess the temporal accuracy of GTFS Realtime and the spatial accuracy of GTFS Schedule feeds. The temporal assessment demonstrates an approach …


Multi-Case Study Of Left-Flank Boundaries Within Supercells, Peyton B. Stevenson 2024 University of Nebraska-Lincoln

Multi-Case Study Of Left-Flank Boundaries Within Supercells, Peyton B. Stevenson

Department of Earth and Atmospheric Sciences: Dissertations, Theses, and Student Research

This study investigates the prevalence and significance of forward-flank convergence boundaries (FFCBs) and left-flank convergence boundaries (LFCBs) in shaping the structure and intensity of supercells, using observational data from various field projects. Unlike previous research focusing on individual cases, this study examines a diverse range of cases to provide comprehensive insights into the relationship between these boundaries and supercell characteristics such as intensity, longevity, and tornadogenesis. By analyzing high-resolution surface data, the research addresses the frequency, location, and intensity of these boundaries, and their impact on pseudo vertical vorticity, pseudo convergence, and density gradients. A total of 228 boundary identifications …


Intimacy Without The Chance Of Heartbreak For Richer, For Poorer, In Sickness & In Health, Cynthia Nguyen 2024 Seattle Pacific University

Intimacy Without The Chance Of Heartbreak For Richer, For Poorer, In Sickness & In Health, Cynthia Nguyen

Honors Projects

The present study investigates the effect of the COVID-19 pandemic on the consumption of porn, shifts in the production of porn consumed between men and women, and the breakdown of any pattern in adult content via film, pictures, and audio. A quantitative approach was done by using R to analyze data pulled off of Pornhub, Reddit’s GoneWildAudio subreddit, and Archive of Our Own from 2018 to 2023. Statistical inference and modeling is used to attempt to find a pattern in the production of online porn across three mediums over several years before, during, and after the pandemic. Regardless of events …


Accessible Real-Time Eye-Gaze Tracking For Neurocognitive Health Assessments, A Multimodal Web-Based Approach, Daniel C. Tisdale 2024 California Polytechnic State University, San Luis Obispo

Accessible Real-Time Eye-Gaze Tracking For Neurocognitive Health Assessments, A Multimodal Web-Based Approach, Daniel C. Tisdale

Master's Theses

We introduce a novel integration of real-time, predictive eye-gaze tracking models into a multimodal dialogue system tailored for remote health assessments. This system is designed to be highly accessible requiring only a conventional webcam for video input along with minimal cursor interaction and utilizes engaging gaze-based tasks that can be performed directly in a web browser. We have crafted dynamic subsystems that capture high-quality data efficiently and maintain quality through instances of user attrition and incomplete calls. Additionally, these subsystems are designed with the foresight to allow for future re-analysis using improved predictive models, as well as enable the creation …


Detection Of Deficiencies And Data Analysis Of Bridge Members With Deep Convolutional Neural Networks, Bennett Jackson 2024 University of Nebraska-Lincoln

Detection Of Deficiencies And Data Analysis Of Bridge Members With Deep Convolutional Neural Networks, Bennett Jackson

Department of Civil and Environmental Engineering: Dissertations, Theses, and Student Research

Concrete cracks and structural steel corrosion are two of the most common defects in bridges. Quantifying and classifying these defects provide bridge inspectors and engineers with valuable data for assessing deterioration levels. However, the bridge inspection process is typically a subjective, time intensive, and tedious task, as defects can be overlooked or in locations not easily accessible. Previous studies have investigated deep learning-based inspection methods, implementing popular models such as Mask R-CNN and U-Net. The architectures of these models offer certain advantages depending on the required task. This thesis aims to evaluate and compare Mask R-CNN and U-Net regarding their …


Factors Predictive Of The Development Of Surgical Site Infection In Thyroidectomy, A Replication Study Of Myssiorek (2018), Kaitlyn M. Kenig 2024 University of Nebraska Medical Center

Factors Predictive Of The Development Of Surgical Site Infection In Thyroidectomy, A Replication Study Of Myssiorek (2018), Kaitlyn M. Kenig

Capstone Experience

The original study aimed to show that thyroidectomy does not result in surgical site infection (SSI) in most cases, and thus routine prescription of antibiotics is not necessary. The study looked to see what risk factors could predict the incidence of SSI. This would highlight those individuals who were at most risk of developing SSI, and then antibiotics would only be prescribed to these individuals instead of all or most individuals who undergo thyroidectomy.

This study used NSQIP data to look at incidence of SSI and look for risk factors that may be predictive of SSI. Only surgeries that were …


Code For Care: Hypertension Prediction In Women Aged 18-39 Years, Kruti Sheth 2024 California State University, San Bernardino

Code For Care: Hypertension Prediction In Women Aged 18-39 Years, Kruti Sheth

Electronic Theses, Projects, and Dissertations

The longstanding prevalence of hypertension, often undiagnosed, poses significant risks of severe chronic and cardiovascular complications if left untreated. This study investigated the causes and underlying risks of hypertension in females aged between 18-39 years. The research questions were: (Q1.) What factors affect the occurrence of hypertension in females aged 18-39 years? (Q2.) What machine learning algorithms are suited for effectively predicting hypertension? (Q3.) How can SHAP values be leveraged to analyze the factors from model outputs? The findings are: (Q1.) Performing Feature selection using binary classification Logistic regression algorithm reveals an array of 30 most influential factors at an …


A Survey Of The Murray State University Csis Department Of Student And Instructor Attitudes In Relation To Earlier Introduction Of Version Control Systems, Gavin Johnson 2024 Murray State University

A Survey Of The Murray State University Csis Department Of Student And Instructor Attitudes In Relation To Earlier Introduction Of Version Control Systems, Gavin Johnson

Honors College Theses

Over the previous 20 years, the software development industry has overseen an evolution in application of Version Control Systems (VCS) from a Centralized Version Control System (CVCS) format to a Decentralized Version Control Format (DVCS). Examples of the former include Perforce and Subversion whilst the latter of the two include Github and BitBucket. As DVCS models allow software contributors to maintain their respective local repositories of relevant code bases, developers are able to work offline and maintain their work with relative fault tolerance. This contrasts to CVCS models, which require software contributors to be connected online to a main server. …


Health And Healthcare: Designing For The Social Determinants Of Health And Blue Zones In North Nashville, Rebecca Tonguis, Honor Thomas, Olivia Hobbs 2024 Belmont University

Health And Healthcare: Designing For The Social Determinants Of Health And Blue Zones In North Nashville, Rebecca Tonguis, Honor Thomas, Olivia Hobbs

Belmont University Research Symposium (BURS)

Owned by North Nashville’s First Community Church, a now empty site in the Osage-North Fisk neighborhood of North Nashville has been identified as a potential site for a new location of The Store, in addition to a community-centric architectural development based on the social determinants of health and informed by the principles behind Blue Zones, the locations with the highest lifespans in the world. Opened by Brad Paisley and Kimberly Williams-Paisley, The Store is a free grocery store that “allow[s] people to shop for their basic needs in a way that protects dignity and fosters hope”, for which North Nashville …


The Impact Of Data Preparation And Model Complexity On The Natural Language Classification Of Chinese News Headlines, Torrey J. Wagner, Dennis Guhl, Brent T. Langhals 2024 Air Force Institute of Technology

The Impact Of Data Preparation And Model Complexity On The Natural Language Classification Of Chinese News Headlines, Torrey J. Wagner, Dennis Guhl, Brent T. Langhals

Faculty Publications

Given the emergence of China as a political and economic power in the 21st century, there is increased interest in analyzing Chinese news articles to better understand developing trends in China. Because of the volume of the material, automating the categorization of Chinese-language news articles by headline text or titles can be an effective way to sort the articles into categories for efficient review. A 383,000-headline dataset labeled with 15 categories from the Toutiao website was evaluated via natural language processing to predict topic categories. The influence of six data preparation variations on the predictive accuracy of four algorithms was …


Making Sense Of Making Parole In New York, Alexandra McGlinchy 2024 The Graduate Center, City University of New York

Making Sense Of Making Parole In New York, Alexandra Mcglinchy

Dissertations, Theses, and Capstone Projects

For many individuals incarcerated in New York, the initial step toward freedom begins with an interview with the Board of Parole. This process, however, is frequently a complex and challenging one, characterized by repeated denials and extended incarcerations. The disparity in outcomes – where one individual may receive over 20 denials and another is granted parole on their first attempt – highlights the ambiguity and inconsistency in the parole decision-making process. This project aims to clarify the factors that influence parole decisions by concentrating on measurable variables. These include age, race, duration of sentence served, proportion of sentence served, type …


Ensemble Classification: An Analysis Of The Random Forest Model, Jarod Korn 2024 The University of Akron

Ensemble Classification: An Analysis Of The Random Forest Model, Jarod Korn

Williams Honors College, Honors Research Projects

The random forest model proposed by Dr. Leo Breiman in 2001 is an ensemble machine learning method for classification prediction and regression. In the following paper, we will conduct an analysis on the random forest model with a focus on how the model works, how it is applied in software, and how it performs on a set of data. To fully understand the model, we will introduce the concept of decision trees, give a summary of the CART model, explain in detail how the random forest model operates, discuss how the model is implemented in software, demonstrate the model by …


Imputation Strategies For Different Categories Of Missing Data, Karthik Chalumuri 2024 University of New Hampshire, Durham

Imputation Strategies For Different Categories Of Missing Data, Karthik Chalumuri

Honors Theses and Capstones

Addressing missing data in research is crucial for ensuring the reliability and validity of study findings, yet it remains a significant challenge. This study investigates the impact of missing data on research outcomes and explores the underutilization of existing tools for managing missingness, potentially leading to gaps in critical information with tangible implications for decision-making processes (Dziura et al.).

Focusing on the different categories of missing data—Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR)—this research examines various imputation strategies tailored to each category. Specifically, we compare the efficacy of several model-based imputation methods, …


Tropical Fish Study In Tahiti, French Polynesia, Miranda Brainard, Caitlyn Swango, Paityn Houglan, Richard Londraville 2024 The University of Akron

Tropical Fish Study In Tahiti, French Polynesia, Miranda Brainard, Caitlyn Swango, Paityn Houglan, Richard Londraville

Williams Honors College, Honors Research Projects

In May of 2023, I embarked on an exciting research journey to Moorea, French Polynesia, alongside fellow students and faculty members from the University of Akron and Syracuse University. This expedition was part of the university-sponsored Tropical Vertebrate Biology course, where we delved into the exploration of various tropical species inhabiting the island, including sea urchins, geckos, and my primary focus, the blackspotted rockskipper.

My research team, composed of my co-authors and me, was particularly intrigued by the unique refuge-seeking behavior displayed by blackspotted rockskippers. These amphibious fish are renowned for their remarkable ability to inhabit tide pools and rocky …


Deep Learning One-Class Classification With Support Vector Methods, Hayden D. Hampton 2024 University of Central Florida

Deep Learning One-Class Classification With Support Vector Methods, Hayden D. Hampton

Graduate Thesis and Dissertation 2023-2024

Through the specialized lens of one-class classification, anomalies–irregular observations that uncharacteristically diverge from normative data patterns–are comprehensively studied. This dissertation focuses on advancing boundary-based methods in one-class classification, a critical approach to anomaly detection. These methodologies delineate optimal decision boundaries, thereby facilitating a distinct separation between normal and anomalous observations. Encompassing traditional approaches such as One-Class Support Vector Machine and Support Vector Data Description, recent adaptations in deep learning offer a rich ground for innovation in anomaly detection. This dissertation proposes three novel deep learning methods for one-class classification, aiming to enhance the efficacy and accuracy of anomaly detection in …


An Unsupervised Machine Learning Algorithm For Clustering Low Dimensional Data Points In Euclidean Grid Space, Josef Lazar 2024 Bard College

An Unsupervised Machine Learning Algorithm For Clustering Low Dimensional Data Points In Euclidean Grid Space, Josef Lazar

Senior Projects Spring 2024

Clustering algorithms provide a useful method for classifying data. The majority of well known clustering algorithms are designed to find globular clusters, however this is not always desirable. In this senior project I present a new clustering algorithm, GBCN (Grid Box Clustering with Noise), which applies a box grid to points in Euclidean space to identify areas of high point density. Points within the grid space that are in adjacent boxes are classified into the same cluster. Conversely, if a path from one point to another can only be completed by traversing an empty grid box, then they are classified …


Bayesian Variable Selection With Shrinkage Priors And Generative Adversarial Networks For Fraud Detection, Amina Issoufou Anaroua 2024 University of Central Florida

Bayesian Variable Selection With Shrinkage Priors And Generative Adversarial Networks For Fraud Detection, Amina Issoufou Anaroua

Graduate Thesis and Dissertation 2023-2024

This research paper focuses on fraud detection in the financial industry using Generative Adversarial Networks (GANs) in conjunction with Uni and Multi Variate Bayesian Model with Shrinkage Priors (BMSP). The problem addressed is the need for accurate and advanced fraud detection techniques due to the increasing sophistication of fraudulent activities. The methodology involves the implementation of GANs and the application of BMSP for variable selection to generate synthetic fraud samples for fraud detection using the augmented dataset. Experimental results demonstrate the effectiveness of the BMSP GAN approach in detecting fraud with improved performance compared to other methods. The conclusions drawn …


Scalar-On-Function Regression: Estimation And Inference Under Complex Survey Designs, Ekaterina Smirnova, Erjia Ciu, Lucia Tabacu, Andrew Leroux 2024 Virginia Commonwealth University

Scalar-On-Function Regression: Estimation And Inference Under Complex Survey Designs, Ekaterina Smirnova, Erjia Ciu, Lucia Tabacu, Andrew Leroux

Mathematics & Statistics Faculty Publications

Increasingly, large, nationally representative health and behavioral surveys conducted under a multistage stratified sampling scheme collect high dimensional data with correlation structured along some domain (eg, wearable sensor data measured continuously and correlated over time, imaging data with spatiotemporal correlation) with the goal of associating these data with health outcomes. Analysis of this sort requires novel methodologic work at the intersection of survey statistics and functional data analysis. Here, we address this crucial gap in the literature by proposing an estimation and inferential framework for generalizable scalar-on-function regression models for data collected under a complex survey design. We propose to: …


Digital Commons powered by bepress