Open Access. Powered by Scholars. Published by Universities.®

Categorical Data Analysis Commons

Open Access. Powered by Scholars. Published by Universities.®

374 Full-Text Articles 555 Authors 182,663 Downloads 88 Institutions

All Articles in Categorical Data Analysis

Faceted Search

374 full-text articles. Page 1 of 15.

Aspect-Based Sentiment Analysis Of Movie Reviews, Samuel Onalaja, Eric Romero, Bosang Yun 2021 Southern Methodist University

Aspect-Based Sentiment Analysis Of Movie Reviews, Samuel Onalaja, Eric Romero, Bosang Yun

SMU Data Science Review

This study investigates a comparison of classification models used to determine aspect based separated text sentiment and predict binary sentiments of movie reviews with genre and aspect specific driving factors. To gain a broader classification analysis, five machine and deep learning algorithms were compared: Logistic Regression (LR), Naive Bayes (NB), Support Vector Machine (SVM), and Recurrent Neural Network Long-Short-Term Memory (RNN LSTM). The various movie aspects that are utilized to separate the sentences are determined through aggregating aspect words from lexicon-base, supervised and unsupervised learning. The driving factors are randomly assigned to various movie aspects and their impact tied to ...


Identification And Characterization Of Forest Fire Risk Zones Leveraging Machine Learning Methods, Joshua Balson, Matt Chinchilla, Cam Lu, Jeff Washburn, Nibhrat Lohia 2021 Southern Methodist University

Identification And Characterization Of Forest Fire Risk Zones Leveraging Machine Learning Methods, Joshua Balson, Matt Chinchilla, Cam Lu, Jeff Washburn, Nibhrat Lohia

SMU Data Science Review

Across the United States, record numbers of wildfires are observed costing billions of dollars in property damage, polluting the environment, and putting lives at risk. The ability of emergency management professionals, city planners, and private entities such as insurance companies to determine if an area is at higher risk of a fire breaking out has never been greater. This paper proposes a novel methodology for identifying and characterizing zones with increased risks of forest fires. Methods involving machine learning techniques use the widely available and recorded data, thus making it possible to implement the tool quickly.


Integration Of Blockchain Technology Into Automobiles To Prevent And Study The Causes Of Accidents, John Kim 2021 California State University, San Bernardino

Integration Of Blockchain Technology Into Automobiles To Prevent And Study The Causes Of Accidents, John Kim

Electronic Theses, Projects, and Dissertations

Automobile collisions occur daily. We now live in an information-driven world, one where technology is quickly evolving. Blockchain technology can change the automotive industry, the safety of the motoring public and its surrounding environment by incorporating this vast array of information. It can place safety and efficiency at the forefront to pedestrians, public establishments, and provide public agencies with pertinent information securely and efficiently. Other industries where Blockchain technology has been effective in are as follows: supply chain management, logistics, and banking. This paper reviews some statistical information regarding automobile collisions, Blockchain technology, Smart Contracts, Smart Cities; assesses the feasibility ...


Data Consultations, Racism, And Critiquing Colonialism In Demographic Datasheets, Nina Exner, Erin Carrillo, Sam A. Leif 2021 Virginia Commonwealth University

Data Consultations, Racism, And Critiquing Colonialism In Demographic Datasheets, Nina Exner, Erin Carrillo, Sam A. Leif

Journal of eScience Librarianship

Objective: We consider how data librarians can take antiracist action in education and consultations. We attempt to apply QuantCrit thinking, particularly to demographic datasheets.

Methods: We synthesize historical context with modern critical thinking about race and data to examine the origins of current assumptions about data. We then present examples of how racial categories can hide, rather than reveal, racial disparities. Finally, we apply the Model of Domain Learning to explain why data science and data management experts can and should expose experts in subject research to the idea of critically examining demographic data collection.

Results: There are good reasons ...


Why Does An Ex-Offender Reoffend?, Jacob Rybak 2021 Kennesaw State University

Why Does An Ex-Offender Reoffend?, Jacob Rybak

Symposium of Student Scholars

What leads to an offender to go back to prison? Iowa has collected data tracking recidivism to evaluate the effectiveness of its programs for released offenders. This data set includes the following for all of the offenders: age groups, type of release (parole vs being discharged at the end of their sentence), race, sex, year of release, supervising district, original offense, and whether they recidivated. For the offenders who return to prison, the data set includes measures on days to return, type of recidivism (technicality or new crime), and what the specific offense was that caused their return.

In the ...


Model-Free Descriptive Modeling For Multivariate Categorical Data With An Ordinal Dependent Variable, Li Wang 2021 University of Massachusetts Amherst

Model-Free Descriptive Modeling For Multivariate Categorical Data With An Ordinal Dependent Variable, Li Wang

Doctoral Dissertations

In the process of statistical modeling, the descriptive modeling plays an essential role in accelerating the formulation of plausible hypotheses in the subsequent explanatory modeling and facilitating the selection of potential variables in the subsequent predictive modeling. Especially, for multivariate categorical data analysis, it is desirable to use the descriptive modeling methods for uncovering and summarizing the potential association structure among multiple categorical variables in a compact manner. However, many classical methods in this case either rely on strong assumptions for parametric models or become infeasible when the data dimension is higher. To this end, we propose a model-free method ...


Statistical Modeling For High-Dimensional Compositional Data With Applications To The Human Microbiome, Thy Dao 2021 University of Arkansas, Fayetteville

Statistical Modeling For High-Dimensional Compositional Data With Applications To The Human Microbiome, Thy Dao

Graduate Theses and Dissertations

Compositional data refer to the data that lie on a simplex, which are common in many scientific domains such as genomics, geology, and economics. As the components in a composition must sum to one, traditional tests based on unconstrained data become inappropriate, and new statistical methods are needed to analyze this special type of data. This dissertation is motivated by some statistical problems arising in the analysis of compositional data. In particular, we focus on the high-dimensional and over-dispersed setting, where the dimensionality of compositions is greater than the sample size and the dispersion parameter is moderate or large. In ...


Knowledge Discovery From Complex Event Time Data With Covariates, Samira Karimi 2021 University of Arkansas, Fayetteville

Knowledge Discovery From Complex Event Time Data With Covariates, Samira Karimi

Graduate Theses and Dissertations

In particular engineering applications, such as reliability engineering, complex types of data are encountered which require novel methods of statistical analysis. Handling covariates properly while managing the missing values is a challenging task. These type of issues happen frequently in reliability data analysis. Specifically, accelerated life testing (ALT) data are usually conducted by exposing test units of a product to severer-than-normal conditions to expedite the failure process. The resulting lifetime and/or censoring data are often modeled by a probability distribution along with a life-stress relationship. However, if the probability distribution and life-stress relationship selected cannot adequately describe the underlying ...


Privacy-Preserving Cloud-Assisted Data Analytics, Wei Bao 2021 University of Arkansas, Fayetteville

Privacy-Preserving Cloud-Assisted Data Analytics, Wei Bao

Graduate Theses and Dissertations

Nowadays industries are collecting a massive and exponentially growing amount of data that can be utilized to extract useful insights for improving various aspects of our life. Data analytics (e.g., via the use of machine learning) has been extensively applied to make important decisions in various real world applications. However, it is challenging for resource-limited clients to analyze their data in an efficient way when its scale is large. Additionally, the data resources are increasingly distributed among different owners. Nonetheless, users' data may contain private information that needs to be protected.

Cloud computing has become more and more popular ...


Grizzly Bears Mortalities And The Survival Of The Species, Courtney Swanson 2021 University of Minnesota - Morris

Grizzly Bears Mortalities And The Survival Of The Species, Courtney Swanson

Senior Seminars and Capstones

In this paper we aim to understand what is happening in the grizzly bear population mortalities from the year 2010 to 2020. We are performing Classical and Regression Tree (CART) methods and Correspondence Analysis on data provided by the U.S. Geological Survey (USGS). We found certain variables in the data set to be important through CART methods. Correspondence Analysis then allowed us to compare these variables to determine their relationships and association to one another. Most of the grizzly bear deaths are human caused and mainly over land and resources such as food and habitat. This aligns with some ...


Data Analysis And Visualization To Dismantle Gender Discrimination In The Field Of Technology, Quinn Bolewicki 2021 The Graduate Center, City University of New York

Data Analysis And Visualization To Dismantle Gender Discrimination In The Field Of Technology, Quinn Bolewicki

Dissertations, Theses, and Capstone Projects

In the United States, a significant population is facing an uphill battle trying to thrive in an industry that has seen exponential growth in recent years. Women, who account for approximately 50.8% of the U.S. population are statistically underpaid and underrepresented in science, technology, engineering, and mathematics (STEM). Despite women-led technology teams establishing a 21% greater return on investment than teams who don’t, and young women largely outperforming men in math according to a 2015 study, there are only three fortune 500 companies led by women, and they comprise only 10% of internet entrepreneurs. Research generates hundreds ...


Why Does An Ex-Offender Reoffend?, Jacob Rybak 2021 Kennesaw State University

Why Does An Ex-Offender Reoffend?, Jacob Rybak

Symposium of Student Scholars

What leads an offender to go back to prison? This researcher has lived in the Georgia State prison system for 3.5 years. Using personal insights as well as analytics, this researcher analyzes Iowa state’s six-year data set tracking recidivism of released offenders and recommends changes to the prison system to address the analytical findings.

The Iowa recidivism data set includes the following information for all offenders: age group, type of release (parole vs different discharges), release year, original offense, and whether they recidivated. For the recidivating offenders, the data set includes the days to return to prison, the ...


Access To Higher Education: Do Schools “Grant” Success?, Nathaniel Jones 2021 Kennesaw State University

Access To Higher Education: Do Schools “Grant” Success?, Nathaniel Jones

Symposium of Student Scholars

University education can lead to upward income mobility for low-income students. Being exposed to other student’s life experiences that are different from their own may highlight activities and actions that they may want to consider aiding their success. According to the U.S. Bureau of Labor Statistics, the median weekly earnings in 2019 for all workers in the U.S. was $969. Of those, U.S. workers who held bachelor’s degrees earned $1,248. In 2016, the Brookings Institute found that Pell Grant recipients and first-generation student loan borrowers attended universities that had lower graduation rates and higher ...


Use Of Linear Discriminant Analysis In Song Classification: Modeling Based On Wilco Albums, Caroline Pollard 2021 University of Mississippi

Use Of Linear Discriminant Analysis In Song Classification: Modeling Based On Wilco Albums, Caroline Pollard

Honors Theses

The study of music recommender algorithms is a relatively new area of study. Although these algorithms serve a variety of functions, they primarily help advertise and suggest music to users on music streaming services. This thesis explores the use of linear discriminant analysis in music categorization for the purpose of serving as a cheaper and simpler content-based recommender algorithm. The use of linear discriminant analysis was tested by creating lineardiscriminant functions that classify Wilco’s songs into their respective albums, specifically A.M., Yankee Hotel Foxtrot, and Sky Blue Sky. 4 sample songs were chosen from each album, and song ...


A Comparison Of The Localized Aviation Mos Program (Lamp) And Terminal Aerodrome Forecast (Taf) Accuracy For General Aviation, Douglas D. Boyd, Thomas A. Guinn 2021 Embry Riddle Aeronautical University

A Comparison Of The Localized Aviation Mos Program (Lamp) And Terminal Aerodrome Forecast (Taf) Accuracy For General Aviation, Douglas D. Boyd, Thomas A. Guinn

Journal of Aviation Technology and Engineering

Background. For general aviation (GA) pilots, operations in instrument meteorological conditions (IMC) carry an elevated risk of a fatal accident. As to whether a general aviation flight can be safely undertaken, aerodrome-specific forecasts (TAF, LAMP) provide guidance. Although LAMP forecasts are more common for GA-frequented aerodromes, nevertheless, the FAA recommends that for such aerodromes (and for which a TAF is not issued) the airman uses the TAF generated for the geographically closest airport for pre-flight weather evaluation. Herein, for non-TAF-issuing airports, the LAMP (sLAMP) predictive accuracy for visual (VFR) and instrument (IFR) flight rules flight category was determined.

Method. sLAMP ...


How Risk-Related Statistics, As Reported In News And Social Media, Are Linked To The Use Of The Public Transit System, Prashiddhi Pokhrel 2021 University of Southern Maine

How Risk-Related Statistics, As Reported In News And Social Media, Are Linked To The Use Of The Public Transit System, Prashiddhi Pokhrel

Thinking Matters Symposium

Due to the pandemic, people have started relying more on televisions, news, social media, and other news outlets for guidance. Moreover, with the increasing amount of news, data, and information there is also an increase in the amount of misleading statistics. People’s opinions and decisions significantly depend on the data, statistics, and information that they are exposed to, as well as their sources. For this project, we want to look at how information and its sources are affecting the decision made by the general public for the usage of the Portland Transit System. It is very important to know ...


Application Of Cycle-By-Cycle Analysis To Eeg Data From Individuals With Phelan-Mcdermid Syndrome, Naomi Miller 2021 Dartmouth College

Application Of Cycle-By-Cycle Analysis To Eeg Data From Individuals With Phelan-Mcdermid Syndrome, Naomi Miller

ENGS 88 Honors Thesis (AB Students)

This study aimed to analyze a novel method of processing data from electroencephalography (EEG) recordings, which implements time-domain cycle-by-cycle analysis. This "bycycle" method, developed by the Cole & Voytek laboratory, was implemented on a EEG dataset of children with and without Phelan-McDermid Syndrome in the hopes of uncovering network-level explanations for the genetic disorder. A supplemental Python pipeline was developed to organize and visualize the data. This led to the discovery of group-level differences in measures of cycle symmetry in alpha band waves over the sensorimotor electrodes. Through the same pipeline, the bycycle tool was validated as a sound EEG analysis ...


Behavior Of Lightning In Developing Storms, Erick A. Tello 2021 Air Force Institute of Technology

Behavior Of Lightning In Developing Storms, Erick A. Tello

Theses and Dissertations

Air Force weather squadrons issue a warning when lightning activity is observed within 5 nautical miles (NM) of protected areas. Upon receiving this warning, personnel outdoors are expected to pause work and move inside. Studies sponsored by the 45th Weather Squadron (45 WS) have concluded that the 5 NM warning radius can be safely reduced for well-developed storms. This thesis investigates whether radii for storms in early development can also be reduced. Our research develops algorithms to partition lightning sensor data into storms. Next, storms are filtered to their earliest lightning events, and the study calculates distances between successive early ...


Node Classification On Relational Graphs Using Deep-Rgcns, Nagasai Chandra 2021 California Polytechnic State University, San Luis Obispo

Node Classification On Relational Graphs Using Deep-Rgcns, Nagasai Chandra

Master's Theses

Knowledge Graphs are fascinating concepts in machine learning as they can hold usefully structured information in the form of entities and their relations. Despite the valuable applications of such graphs, most knowledge bases remain incomplete. This missing information harms downstream applications such as information retrieval and opens a window for research in statistical relational learning tasks such as node classification and link prediction. This work proposes a deep learning framework based on existing relational convolutional (R-GCN) layers to learn on highly multi-relational data characteristic of realistic knowledge graphs for node property classification tasks. We propose a deep and improved variant ...


Carbon Dioxide And Particulate Matter Concentration On Hampton Roads Air Quality, Gregory Hubbard 2021 Old Dominion University

Carbon Dioxide And Particulate Matter Concentration On Hampton Roads Air Quality, Gregory Hubbard

OUR Journal: ODU Undergraduate Research Journal

Hampton Roads has been a maritime crossroads for the last 400 years. Industrialization has impacted the coastal region for the last 250 years. The expansion of the Port of Virginia in 2019 has created dense traffic in the region resulting in impacts to air quality. Two waste products that affect humans are particulate matter and carbon dioxide. Both respective emissions can cause adverse effects on humans, such as asthma, some lung cancers, and other respiratory distress. Scientists and health practitioners are studying the effects of particulate matter on human health. Hampton Roads, in particular, because of its unique location on ...


Digital Commons powered by bepress