Open Access. Powered by Scholars. Published by Universities.®

Categorical Data Analysis Commons

Open Access. Powered by Scholars. Published by Universities.®

353 Full-Text Articles 578 Authors 86,862 Downloads 79 Institutions

All Articles in Categorical Data Analysis

Faceted Search

353 full-text articles. Page 1 of 14.

Direct Questioning Of Sensitive Topics In Public Health Studies: A Simulation Study, Jessica K. Fox, Evrim Oral 2020 LSU Health Sciences Center, School of Public Health, Biostatistics Program

Direct Questioning Of Sensitive Topics In Public Health Studies: A Simulation Study, Jessica K. Fox, Evrim Oral

Annual Symposium on Biomathematics and Ecology Education and Research

No abstract provided.


Applying The Data: Predictive Analytics In Sport, Anthony Teeter, Margo Bergman 2020 University of Washington, Tacoma

Applying The Data: Predictive Analytics In Sport, Anthony Teeter, Margo Bergman

Access*: Interdisciplinary Journal of Student Research and Scholarship

The history of wagering predictions and their impact on wide reaching disciplines such as statistics and economics dates to at least the 1700’s, if not before. Predicting the outcomes of sports is a multibillion-dollar business that capitalizes on these tools but is in constant development with the addition of big data analytics methods. Sportsline.com, a popular website for fantasy sports leagues, provides odds predictions in multiple sports, produces proprietary computer models of both winning and losing teams, and provides specific point estimates. To test likely candidates for inclusion in these prediction algorithms, the authors developed a computer model ...


Predicting Postoperative Delirium Risk For Intracranial Surgery: A Statistical Machine Learning Approach, Juliet Aygun, Alaina Bartfeld, Sahana Rayan 2020 Purdue University

Predicting Postoperative Delirium Risk For Intracranial Surgery: A Statistical Machine Learning Approach, Juliet Aygun, Alaina Bartfeld, Sahana Rayan

The Journal of Purdue Undergraduate Research

No abstract provided.


Analyzing The Fractal Dimension Of Various Musical Pieces, Nathan Clark 2020 University of Arkansas, Fayetteville

Analyzing The Fractal Dimension Of Various Musical Pieces, Nathan Clark

Industrial Engineering Undergraduate Honors Theses

One of the most common tools for evaluating data is regression. This technique, widely used by industrial engineers, explores linear relationships between predictors and the response. Each observation of the response is a fixed linear combination of the predictors with an added error element. The method is built on the assumption that this error is normally distributed across all observations and has a mean of zero. In some cases, it has been found that the inherent variation is not the result of a random variable, but is instead the result of self-symmetric properties of the observations. For data with these ...


Improving The Quality And Design Of Retrospective Clinical Outcome Studies That Utilize Electronic Health Records, Oliwier Dziadkowiec, Jeffery S. Durbin, Vignesh Jayaraman Muralidharan, Megan L. Novak, Brendon T. Cornett 2020 HCA Healthcare Mountain MidAmerica and Continental Divisions

Improving The Quality And Design Of Retrospective Clinical Outcome Studies That Utilize Electronic Health Records, Oliwier Dziadkowiec, Jeffery S. Durbin, Vignesh Jayaraman Muralidharan, Megan L. Novak, Brendon T. Cornett

HCA Healthcare Journal of Medicine

Electronic health records (EHRs) are an excellent source for secondary data analysis. Studies based on EHR-derived data, if designed properly, can answer previously unanswerable clinical research questions. In this paper we will highlight the benefits of large retrospective studies from secondary sources such as EHRs, examine retrospective cohort and case-control study design challenges, as well as methodological and statistical adjustment that can be made to overcome some of the inherent design limitations, in order to increase the generalizability, validity and reliability of the results obtained from these studies.


Learning Networks With Categorical Data Using Distance Correlation, And A Novel Graph-Based Multivariate Test, Jian Tinker 2020 University of Arkansas, Fayetteville

Learning Networks With Categorical Data Using Distance Correlation, And A Novel Graph-Based Multivariate Test, Jian Tinker

Theses and Dissertations

We study the use of distance correlation for statistical inference on categorical data, especially the induction of probability networks. Szekely et al. first defined distance correlation for continuous variables in [42], and Zhang translated the concept into the categorical setting in [57] by defining dCor(X,Y) for categorical variables X = (x1,...,xI) and Y = (y1,...,yJ) where P(X=xi)=[pi]i and P(Y=yi)=[pi]j with the formula [Please open the document]

Part I of the dissertation covers the background we need to understand this formula, and prepares us to analyze the properties and performance of ...


Analysis Of Gameplay Strategies In Hearthstone: A Data Science Approach, Connor W. Watson 2020 New Jersey Institute of Technology

Analysis Of Gameplay Strategies In Hearthstone: A Data Science Approach, Connor W. Watson

Theses

In recent years, games have been a popular test bed for AI research, and the presence of Collectible Card Games (CCGs) in that space is still increasing. One such CCG for both competitive/casual play and AI research is Hearthstone, a two-player adversarial game where players seeks to implement one of several gameplay strategies to defeat their opponent and decrease all of their Health points to zero. Although some open source simulators exist, some of their methodologies for simulated agents create opponents with a relatively low skill level. Using evolutionary algorithms, this thesis seeks to evolve agents with a higher ...


Metabolomic Profiling Of Nicotiana Spp. Nectars Indicate That Pollinator Feeding Preference Is A Stronger Determinant Than Plant Phylogenetics In Shaping Nectar Diversity, Fredy A. Silva, Elizabeth C. Chatt, Siti-Nabilla Mahalim, Adel Guirgis, Xingche Guo, Dan S. Nettleton, Basil J. Nikolau, Robert W. Thornburg 2020 Iowa State University

Metabolomic Profiling Of Nicotiana Spp. Nectars Indicate That Pollinator Feeding Preference Is A Stronger Determinant Than Plant Phylogenetics In Shaping Nectar Diversity, Fredy A. Silva, Elizabeth C. Chatt, Siti-Nabilla Mahalim, Adel Guirgis, Xingche Guo, Dan S. Nettleton, Basil J. Nikolau, Robert W. Thornburg

Statistics Publications

Floral nectar is a rich secretion produced by the nectary gland and is offered as reward to attract pollinators leading to improved seed set. Nectars are composed of a complex mixture of sugars, amino acids, proteins, vitamins, lipids, organic and inorganic acids. This composition is influenced by several factors, including floral morphology, mechanism of nectar secretion, time of flowering, and visitation by pollinators. The objective of this study was to determine the contributions of flowering time, plant phylogeny, and pollinator selection on nectar composition in Nicotiana. The main classes of nectar metabolites (sugars and amino acids) were quantified using gas ...


Decision Tree For Predicting The Party Of Legislators, Afsana Mimi 2020 CUNY New York City College of Technology

Decision Tree For Predicting The Party Of Legislators, Afsana Mimi

Publications and Research

The motivation of the project is to identify the legislators who voted frequently against their party in terms of their roll call votes using Office of Clerk U.S. House of Representatives Data Sets collected in 2018 and 2019. We construct a model to predict the parties of legislators based on their votes. The method we used is Decision Tree from Data Mining. Python was used to collect raw data from internet, SAS was used to clean data, and all other calculations and graphical presentations are performed using the R software.


First-Year Computer Science Students: Pathways And Perceptions In Introductory Computer Science Courses, Christina A. LeBlanc 2020 University of Maine

First-Year Computer Science Students: Pathways And Perceptions In Introductory Computer Science Courses, Christina A. Leblanc

Electronic Theses and Dissertations

This study examined student perceptions and experiences of an introductory Computer Science course at the University of Maine; COS 125: Introduction to Problem Solving Using Computer Programs. It also explored the pathways that students pursue after taking COS 125, depending on their success in the course, and their motivation to persist. Through characterizing student populations and their performance in their first semester in the Computer Science program, they can be placed into one of three categories that explain their path; a “continuer” (passed COS 125 and decided to stay in the major), a “persister” (did not pass COS 125 and ...


Act Scores Across Minnesota's Congressional Districts, Katie Moynihan 2020 Concordia University St. Paul

Act Scores Across Minnesota's Congressional Districts, Katie Moynihan

Research and Scholarship Symposium Posters

Data analysis was conducted to test factors which could affect the ACT scores of Minnesota high school students. Average composite scores across the state’s eight congressional districts were evaluated. Factors studied include family income, parental education, diversity, district location, graduation class size, and graduation rate. Methodology and results will be discussed.


Do We Need To Reconsider The Cmam Admission And Discharge Criteria?; An Analysis Of Cmam Data In South Sudan, Eunyong Ahn, Cyprian Ouma, Mesfin Loha, Asrat Dibaba, Wendy Dyment, Jae Kwang Kim, Nam Seon Beck, Taesung Park 2020 Seoul National University

Do We Need To Reconsider The Cmam Admission And Discharge Criteria?; An Analysis Of Cmam Data In South Sudan, Eunyong Ahn, Cyprian Ouma, Mesfin Loha, Asrat Dibaba, Wendy Dyment, Jae Kwang Kim, Nam Seon Beck, Taesung Park

Statistics Publications

Background: Weight-for-height Z-score (WHZ) and Mid Upper Arm Circumference (MUAC) are both commonly used as acute malnutrition screening criteria. However, there exists disparity between the groups identified as malnourished by them. Thus, here we aim to investigate the clinical features and linkage with chronicity of the acute malnutrition cases identified by either WHZ or MUAC. Besides, there exists evidence indicating that fat restoration is disproportionately rapid compared to that of muscle gain in hospitalized malnourished children but related research at community level is lacking. In this study we suggest proxy measure to inspect body composition restoration responding to malnutrition management ...


Using Alteryx Designer In Audit, Nolan Asiala 2020 Grand Valley State University

Using Alteryx Designer In Audit, Nolan Asiala

Honors Projects

My senior project was built around data analysis and how it relates to the auditing profession. Initially, I was planning on attending a data analytics competition, but that was canceled due to the events of COVID-19. This project utilized the Alteryx Designer program to demonstrate how it can be used during an audit engagement. By creating a workflow in Alteryx Designer, a report from a client can be cleaned and reformatted into a working dataset. My project includes two Excel files, a Microsoft Word document that serves as a brief introduction to the program, and a video describing the workflow ...


How Data Is Changing The World Of Healthcare, Cameron Marous 2020 Ohio Northern University

How Data Is Changing The World Of Healthcare, Cameron Marous

Honors Capstone Enhancement Presentations

No abstract provided.


Evaluation Of Text Mining Techniques Using Twitter Data For Hurricane Disaster Resilience, Joshua Eason, Sathish Kumar 2020 Creighton University

Evaluation Of Text Mining Techniques Using Twitter Data For Hurricane Disaster Resilience, Joshua Eason, Sathish Kumar

SDSU Data Science Symposium

Data obtained from social media microblogging websites such as Twitter provide the unique ability to collect and analyze conversations of the public in order to gain perspective on the thoughts and feelings of the general public. Sentiment and volume analysis techniques were applied to the dataset in order to gain an understanding of the amount and level of sentiment associated with certain disaster-related tweets, including a topical analysis of specific terms. This study showed that disaster-type events such as a hurricane can cause some strong negative sentiment in the period of time directly preceding the event, but ultimately returns quickly ...


Informal Professional Development On Twitter: Exploring The Online Communities Of Mathematics Educators, Jaymie Ruddock 2020 Southern Methodist University

Informal Professional Development On Twitter: Exploring The Online Communities Of Mathematics Educators, Jaymie Ruddock

SMU Journal of Undergraduate Research

Professional development in its most traditional form is a classroom setting with a lecturer and an overwhelming amount of information. It is no surprise, then, that informal professional development away from institutions and on the teacher's own terms is a growing phenomenon due to an increased presence of educators on social media. These communities of educators use hashtags to broadcast to each other, with general hashtags such as #edchat having the broadest audience. However, many math educators usethe hashtags #ITeachMath and #MTBoS, communities I was interested in learning more about. I built a python script that used Tweepy to ...


Mapping Relationships And Positions Of Objects In Images Using Mask And Bounding Box Data, Jaime M. Villanueva Jr, Anantharam Subramanian, Vishal Ahir, Andrew Pollock 2020 Southern Methodist University

Mapping Relationships And Positions Of Objects In Images Using Mask And Bounding Box Data, Jaime M. Villanueva Jr, Anantharam Subramanian, Vishal Ahir, Andrew Pollock

SMU Data Science Review

In this paper we present novel methods for automatically annotating images with relationship and position tags that are derived using mask and bounding box data. A Mask Region-based Convolutional Neural Network (Mask R-CNN) is used as the foundation for the ob- ject detection process. The relationships are found by manipulating the bounding box and mask segmentation outputs of a Mask R-CNN. The absolute positions, the positions of the objects relative to the image, and the relative positions, the positions of objects relative to the other objects, are then associated with the images as annotations that are out- put in order ...


Statistical Data Integration In Survey Sampling: A Review, Shu Yang, Jae Kwang Kim 2020 North Carolina State University

Statistical Data Integration In Survey Sampling: A Review, Shu Yang, Jae Kwang Kim

Statistics Publications

Finite population inference is a central goal in survey sampling. Probability sampling is the main statistical approach to finite population inference. Challenges arise due to high cost and increasing non-response rates. Data integration provides a timely solution by leveraging multiple data sources to provide more robust and efficient inference than using any single data source alone. The technique for data integration varies depending on types of samples and available information to be combined. This article provides a systematic review of data integration techniques for combining probability samples, probability and non-probability samples, and probability and big data samples. We discuss a ...


Accuracy Of Avs Life Expectancy Reports, Ariya Aghababa 2020 The University of Akron

Accuracy Of Avs Life Expectancy Reports, Ariya Aghababa

Williams Honors College, Honors Research Projects

Use insurance company data to predict the trends in life insurance life expectancy reports. Also, use the data to predict what impairments could potentially decrease or increase an insured's life expectancy based on reports created by various Actuaries at life settlement companies.


Teaching Introductory Statistics With Datacamp, Benjamin Baumer, Andrew P. Bray, Mine Çetinkaya-Rundel, Johanna S. Hardin 2020 Smith College

Teaching Introductory Statistics With Datacamp, Benjamin Baumer, Andrew P. Bray, Mine Çetinkaya-Rundel, Johanna S. Hardin

Statistical and Data Sciences: Faculty Publications

We designed a sequence of courses for the DataCamp online learning platform that approximates the content of a typical introductory statistics course. We discuss the design and implementation of these courses and illustrate how they can be successfully integrated into a brick-and-mortar class. We reflect on the process of creating content for online consumers, ruminate on the pedagogical considerations we faced, and describe an R package for statistical inference that became a by-product of this development process. We discuss the pros and cons of creating the course sequence and express our view that some aspects were particularly problematic. The issues ...


Digital Commons powered by bepress