Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Categorical Data Analysis

Institution
Keyword
Publication Year
Publication
Publication Type
File Type

Articles 1 - 30 of 100

Full-Text Articles in Computer Sciences

Code For Care: Hypertension Prediction In Women Aged 18-39 Years, Kruti Sheth May 2024

Code For Care: Hypertension Prediction In Women Aged 18-39 Years, Kruti Sheth

Electronic Theses, Projects, and Dissertations

The longstanding prevalence of hypertension, often undiagnosed, poses significant risks of severe chronic and cardiovascular complications if left untreated. This study investigated the causes and underlying risks of hypertension in females aged between 18-39 years. The research questions were: (Q1.) What factors affect the occurrence of hypertension in females aged 18-39 years? (Q2.) What machine learning algorithms are suited for effectively predicting hypertension? (Q3.) How can SHAP values be leveraged to analyze the factors from model outputs? The findings are: (Q1.) Performing Feature selection using binary classification Logistic regression algorithm reveals an array of 30 most influential factors at an …


A Survey Of The Murray State University Csis Department Of Student And Instructor Attitudes In Relation To Earlier Introduction Of Version Control Systems, Gavin Johnson Apr 2024

A Survey Of The Murray State University Csis Department Of Student And Instructor Attitudes In Relation To Earlier Introduction Of Version Control Systems, Gavin Johnson

Honors College Theses

Over the previous 20 years, the software development industry has overseen an evolution in application of Version Control Systems (VCS) from a Centralized Version Control System (CVCS) format to a Decentralized Version Control Format (DVCS). Examples of the former include Perforce and Subversion whilst the latter of the two include Github and BitBucket. As DVCS models allow software contributors to maintain their respective local repositories of relevant code bases, developers are able to work offline and maintain their work with relative fault tolerance. This contrasts to CVCS models, which require software contributors to be connected online to a main server. …


The Impact Of Data Preparation And Model Complexity On The Natural Language Classification Of Chinese News Headlines, Torrey J. Wagner, Dennis Guhl, Brent T. Langhals Mar 2024

The Impact Of Data Preparation And Model Complexity On The Natural Language Classification Of Chinese News Headlines, Torrey J. Wagner, Dennis Guhl, Brent T. Langhals

Faculty Publications

Given the emergence of China as a political and economic power in the 21st century, there is increased interest in analyzing Chinese news articles to better understand developing trends in China. Because of the volume of the material, automating the categorization of Chinese-language news articles by headline text or titles can be an effective way to sort the articles into categories for efficient review. A 383,000-headline dataset labeled with 15 categories from the Toutiao website was evaluated via natural language processing to predict topic categories. The influence of six data preparation variations on the predictive accuracy of four algorithms was …


Reducing Food Scarcity: The Benefits Of Urban Farming, S.A. Claudell, Emilio Mejia Dec 2023

Reducing Food Scarcity: The Benefits Of Urban Farming, S.A. Claudell, Emilio Mejia

Journal of Nonprofit Innovation

Urban farming can enhance the lives of communities and help reduce food scarcity. This paper presents a conceptual prototype of an efficient urban farming community that can be scaled for a single apartment building or an entire community across all global geoeconomics regions, including densely populated cities and rural, developing towns and communities. When deployed in coordination with smart crop choices, local farm support, and efficient transportation then the result isn’t just sustainability, but also increasing fresh produce accessibility, optimizing nutritional value, eliminating the use of ‘forever chemicals’, reducing transportation costs, and fostering global environmental benefits.

Imagine Doris, who is …


Identifying Key Activity Indicators In Rats' Neuronal Data Using Lasso Regularized Logistic Regression, Avery Woods May 2023

Identifying Key Activity Indicators In Rats' Neuronal Data Using Lasso Regularized Logistic Regression, Avery Woods

Honors Theses

This thesis aims to identify timestamps of rats’ neuronal activity that best determine behavior using a machine learning model. Neuronal data is a complex and high-dimensional dataset, and identifying the most informative features is crucial for understanding the underlying neuronal processes. The Lasso regularization technique is employed to select the most relevant features of the data to the model’s prediction. The results of this study provide insights into the key activity indicators that are associated with specific behaviors or cognitive processes in rats, as well as the effect that stress can have on neuronal activity and behavior. Ultimately, it was …


Fraud Pattern Detection For Nft Markets, Andrew Leppla, Jorge Olmos, Jaideep Lamba Mar 2023

Fraud Pattern Detection For Nft Markets, Andrew Leppla, Jorge Olmos, Jaideep Lamba

SMU Data Science Review

Non-Fungible Tokens (NFTs) enable ownership and transfer of digital assets using blockchain technology. As a relatively new financial asset class, NFTs lack robust oversight and regulations. These conditions create an environment that is susceptible to fraudulent activity and market manipulation schemes. This study examines the buyer-seller network transactional data from some of the most popular NFT marketplaces (e.g., AtomicHub, OpenSea) to identify and predict fraudulent activity. To accomplish this goal multiple features such as price, volume, and network metrics were extracted from NFT transactional data. These were fed into a Multiple-Scale Convolutional Neural Network that predicts suspected fraudulent activity based …


A Deep Bilstm Machine Learning Method For Flight Delay Prediction Classification, Desmond B. Bisandu Phd, Irene Moulitsas Phd Jan 2023

A Deep Bilstm Machine Learning Method For Flight Delay Prediction Classification, Desmond B. Bisandu Phd, Irene Moulitsas Phd

Journal of Aviation/Aerospace Education & Research

This paper proposes a classification approach for flight delays using Bidirectional Long Short-Term Memory (BiLSTM) and Long Short-Term Memory (LSTM) models. Flight delays are a major issue in the airline industry, causing inconvenience to passengers and financial losses to airlines. The BiLSTM and LSTM models, powerful deep learning techniques, have shown promising results in a classification task. In this study, we collected a dataset from the United States (US) Bureau of Transportation Statistics (BTS) of flight on-time performance information and used it to train and test the BiLSTM and LSTM models. We set three criteria for selecting highly important features …


Knowledge Discovery On The Integrative Analysis Of Electrical And Mechanical Dyssynchrony To Improve Cardiac Resynchronization Therapy, Zhuo He Jan 2023

Knowledge Discovery On The Integrative Analysis Of Electrical And Mechanical Dyssynchrony To Improve Cardiac Resynchronization Therapy, Zhuo He

Dissertations, Master's Theses and Master's Reports

Cardiac resynchronization therapy (CRT) is a standard method of treating heart failure by coordinating the function of the left and right ventricles. However, up to 40% of CRT recipients do not experience clinical symptoms or cardiac function improvements. The main reasons for CRT non-response include: (1) suboptimal patient selection based on electrical dyssynchrony measured by electrocardiogram (ECG) in current guidelines; (2) mechanical dyssynchrony has been shown to be effective but has not been fully explored; and (3) inappropriate placement of the CRT left ventricular (LV) lead in a significant number of patients.

In terms of mechanical dyssynchrony, we utilize an …


Cov-Inception: Covid-19 Detection Tool Using Chest X-Ray, Aswini Thota, Ololade Awodipe, Rashmi Patel Sep 2022

Cov-Inception: Covid-19 Detection Tool Using Chest X-Ray, Aswini Thota, Ololade Awodipe, Rashmi Patel

SMU Data Science Review

Since the pandemic started, researchers have been trying to find a way to detect COVID-19 which is a cost-effective, fast, and reliable way to keep the economy viable and running. This research details how chest X-ray radiography can be utilized to detect the infection. This can be for implementation in Airports, Schools, and places of business. Currently, Chest imaging is not a first-line test for COVID-19 due to low diagnostic accuracy and confounding with other viral pneumonia. Different pre-trained algorithms were fine-tuned and applied to the images to train the model and the best model obtained was fine-tuned InceptionV3 model …


Computer Aided Diagnosis System For Breast Cancer Using Deep Learning., Asma Baccouche Aug 2022

Computer Aided Diagnosis System For Breast Cancer Using Deep Learning., Asma Baccouche

Electronic Theses and Dissertations

The recent rise of big data technology surrounding the electronic systems and developed toolkits gave birth to new promises for Artificial Intelligence (AI). With the continuous use of data-centric systems and machines in our lives, such as social media, surveys, emails, reports, etc., there is no doubt that data has gained the center of attention by scientists and motivated them to provide more decision-making and operational support systems across multiple domains. With the recent breakthroughs in artificial intelligence, the use of machine learning and deep learning models have achieved remarkable advances in computer vision, ecommerce, cybersecurity, and healthcare. Particularly, numerous …


Split Classification Model For Complex Clustered Data, Katherine Gerot Mar 2022

Split Classification Model For Complex Clustered Data, Katherine Gerot

Honors Theses

Classification in high-dimensional data has generated tremendous interest in a multitude of fields. Data in higher dimensions often tend to reside in non-Euclidean metric space. This prevents Euclidean-based classification methodologies, such as regression, from reliably modeling the data. Many proposed models rely on computationally-complex embedding to convert the data to a more usable format. Others, namely the Support Vector Machine, rely on kernel manipulation to implicitly describe the "feature space" to arrive at a non-linear decision boundary. The proposed methodology in this paper seeks to classify complex data in a relatively computationally-simple and explainable manner.


Session 5: Equipment Finance Credit Risk Modeling - A Case Study In Creative Model Development & Nimble Data Engineering, Edward Krueger, Landon Thompson, Josh Moore Feb 2022

Session 5: Equipment Finance Credit Risk Modeling - A Case Study In Creative Model Development & Nimble Data Engineering, Edward Krueger, Landon Thompson, Josh Moore

SDSU Data Science Symposium

This presentation will focus first on providing an overview of Channel and the Risk Analytics team that performed this case study. Given that context, we’ll then dive into our approach for building the modeling development data set, techniques and tools used to develop and implement the model into a production environment, and some of the challenges faced upon launch. Then, the presentation will pivot to the data engineering pipeline. During this portion, we will explore the application process and what happens to the data we collect. This will include how we extract & store the data along with how it …


The Data Analytics And The Science Revolution, Leila Halawi, Amal Clarke, Kelly George Feb 2022

The Data Analytics And The Science Revolution, Leila Halawi, Amal Clarke, Kelly George

Publications

This text highlights the difference between analytics and data science, using predictive analytic techniques to analyze different historical data, including aviation data and concrete data, interpreting the predictive models, and highlighting the steps to deploy the models and the steps ahead. The book combines the conceptual perspective and a hands-on approach to predictive analytics using SAS VIYA, an analytic and data management platform. The authors use SAS VIYA to focus on analytics to solve problems, highlight how analytics is applied in the airline and business environment, and compare several different modeling techniques. They decipher complex algorithms to demonstrate how they …


A Predictive Model To Predict Cyberattack Using Self-Normalizing Neural Networks, Oluwapelumi Eniodunmo Jan 2022

A Predictive Model To Predict Cyberattack Using Self-Normalizing Neural Networks, Oluwapelumi Eniodunmo

Theses, Dissertations and Capstones

Cyberattack is a never-ending war that has greatly threatened secured information systems. The development of automated and intelligent systems provides more computing power to hackers to steal information, destroy data or system resources, and has raised global security issues. Statistical and Data mining tools have received continuous research and improvements. These tools have been adopted to create sophisticated intrusion detection systems that help information systems mitigate and defend against cyberattacks. However, the advancement in technology and accessibility of information makes more identifiable elements that can be used to gain unauthorized access to systems and resources. Data mining and classification tools …


Realtime Event Detection In Sports Sensor Data With Machine Learning, Mallory Cashman Jan 2022

Realtime Event Detection In Sports Sensor Data With Machine Learning, Mallory Cashman

Honors Theses and Capstones

Machine learning models can be trained to classify time series based sports motion data, without reliance on assumptions about the capabilities of the users or sensors. This can be applied to predict the count of occurrences of an event in a time period. The experiment for this research uses lacrosse data, collected in partnership with SPAITR - a UNH undergraduate startup developing motion tracking devices for lacrosse. Decision Tree and Support Vector Machine (SVM) models are trained and perform with high success rates. These models improve upon previous work in human motion event detection and can be used a reference …


Integration Of Blockchain Technology Into Automobiles To Prevent And Study The Causes Of Accidents, John Kim Dec 2021

Integration Of Blockchain Technology Into Automobiles To Prevent And Study The Causes Of Accidents, John Kim

Electronic Theses, Projects, and Dissertations

Automobile collisions occur daily. We now live in an information-driven world, one where technology is quickly evolving. Blockchain technology can change the automotive industry, the safety of the motoring public and its surrounding environment by incorporating this vast array of information. It can place safety and efficiency at the forefront to pedestrians, public establishments, and provide public agencies with pertinent information securely and efficiently. Other industries where Blockchain technology has been effective in are as follows: supply chain management, logistics, and banking. This paper reviews some statistical information regarding automobile collisions, Blockchain technology, Smart Contracts, Smart Cities; assesses the feasibility …


Privacy-Preserving Cloud-Assisted Data Analytics, Wei Bao Jul 2021

Privacy-Preserving Cloud-Assisted Data Analytics, Wei Bao

Graduate Theses and Dissertations

Nowadays industries are collecting a massive and exponentially growing amount of data that can be utilized to extract useful insights for improving various aspects of our life. Data analytics (e.g., via the use of machine learning) has been extensively applied to make important decisions in various real world applications. However, it is challenging for resource-limited clients to analyze their data in an efficient way when its scale is large. Additionally, the data resources are increasingly distributed among different owners. Nonetheless, users' data may contain private information that needs to be protected.

Cloud computing has become more and more popular in …


Node Classification On Relational Graphs Using Deep-Rgcns, Nagasai Chandra Mar 2021

Node Classification On Relational Graphs Using Deep-Rgcns, Nagasai Chandra

Master's Theses

Knowledge Graphs are fascinating concepts in machine learning as they can hold usefully structured information in the form of entities and their relations. Despite the valuable applications of such graphs, most knowledge bases remain incomplete. This missing information harms downstream applications such as information retrieval and opens a window for research in statistical relational learning tasks such as node classification and link prediction. This work proposes a deep learning framework based on existing relational convolutional (R-GCN) layers to learn on highly multi-relational data characteristic of realistic knowledge graphs for node property classification tasks. We propose a deep and improved variant, …


Maternal Proximity To Mountaintop Removal Mining And Birth Defects In Appalachian Kentucky, 1997-2003, Daniel B. Cooper Jan 2021

Maternal Proximity To Mountaintop Removal Mining And Birth Defects In Appalachian Kentucky, 1997-2003, Daniel B. Cooper

Theses and Dissertations--Public Health (M.P.H. & Dr.P.H.)

Background: Extraction of coal through mountaintop removal mining (MTR) alters many dimensions of the landscape, and explosive blasts, exposed rock, and coal washing have the potential to pollute air and water with substances known to increase risk of developmental and birth anomalies. Previous research suggests that infants born to mothers living in MTR coal mining counties have higher prevalence of most types of birth defects.

Objectives: This study seeks to examine further the relationship between MTR activity and birth defects by employing individual level exposure estimation through precise satellite data of MTR activity in the Appalachian region and maternal residence …


Integrating Data Science Ethics Into An Undergraduate Major, Benjamin Baumer, Randi L. Garcia, Albert Y. Kim, Katherine M. Kinnaird, Miles Q. Ott Jul 2020

Integrating Data Science Ethics Into An Undergraduate Major, Benjamin Baumer, Randi L. Garcia, Albert Y. Kim, Katherine M. Kinnaird, Miles Q. Ott

Statistical and Data Sciences: Faculty Publications

We present a programmatic approach to incorporating ethics into an undergraduate major in statistical and data sciences. We discuss departmental-level initiatives designed to meet the National Academy of Sciences recommendation for weaving ethics into the curriculum from top-to-bottom as our majors progress from our introductory courses to our senior capstone course, as well as from side-to-side through co-curricular programming. We also provide six examples of data science ethics modules used in five different courses at our liberal arts college, each focusing on a different ethical consideration. The modules are designed to be portable such that they can be flexibly incorporated …


Next-Term Grade Prediction: A Machine Learning Approach, Audrey Tedja Widjaja, Lei Wang, Nghia Truong Trong, Aldy Gunawan, Ee-Peng Lim Jul 2020

Next-Term Grade Prediction: A Machine Learning Approach, Audrey Tedja Widjaja, Lei Wang, Nghia Truong Trong, Aldy Gunawan, Ee-Peng Lim

Research Collection School Of Computing and Information Systems

As students progress in their university programs, they have to face many course choices. It is important for them to receive guidance based on not only their interest, but also the "predicted" course performance so as to improve learning experience and optimise academic performance. In this paper, we propose the next-term grade prediction task as a useful course selection guidance. We propose a machine learning framework to predict course grades in a specific program term using the historical student-course data. In this framework, we develop the prediction model using Factorization Machine (FM) and Long Short Term Memory combined with FM …


Analysis Of Gameplay Strategies In Hearthstone: A Data Science Approach, Connor W. Watson May 2020

Analysis Of Gameplay Strategies In Hearthstone: A Data Science Approach, Connor W. Watson

Theses

In recent years, games have been a popular test bed for AI research, and the presence of Collectible Card Games (CCGs) in that space is still increasing. One such CCG for both competitive/casual play and AI research is Hearthstone, a two-player adversarial game where players seeks to implement one of several gameplay strategies to defeat their opponent and decrease all of their Health points to zero. Although some open source simulators exist, some of their methodologies for simulated agents create opponents with a relatively low skill level. Using evolutionary algorithms, this thesis seeks to evolve agents with a higher skill …


Decision Tree For Predicting The Party Of Legislators, Afsana Mimi May 2020

Decision Tree For Predicting The Party Of Legislators, Afsana Mimi

Publications and Research

The motivation of the project is to identify the legislators who voted frequently against their party in terms of their roll call votes using Office of Clerk U.S. House of Representatives Data Sets collected in 2018 and 2019. We construct a model to predict the parties of legislators based on their votes. The method we used is Decision Tree from Data Mining. Python was used to collect raw data from internet, SAS was used to clean data, and all other calculations and graphical presentations are performed using the R software.


First-Year Computer Science Students: Pathways And Perceptions In Introductory Computer Science Courses, Christina A. Leblanc May 2020

First-Year Computer Science Students: Pathways And Perceptions In Introductory Computer Science Courses, Christina A. Leblanc

Electronic Theses and Dissertations

This study examined student perceptions and experiences of an introductory Computer Science course at the University of Maine; COS 125: Introduction to Problem Solving Using Computer Programs. It also explored the pathways that students pursue after taking COS 125, depending on their success in the course, and their motivation to persist. Through characterizing student populations and their performance in their first semester in the Computer Science program, they can be placed into one of three categories that explain their path; a “continuer” (passed COS 125 and decided to stay in the major), a “persister” (did not pass COS 125 and …


Novel Inference Methods For Generalized Linear Models Using Shrinkage Priors And Data Augmentation., Arinjita Bhattacharyya May 2020

Novel Inference Methods For Generalized Linear Models Using Shrinkage Priors And Data Augmentation., Arinjita Bhattacharyya

Electronic Theses and Dissertations

Generalized linear models have broad applications in biostatistics and sociology. In a regression setup, the main target is to find a relevant set of predictors out of a large collection of covariates. Sparsity is the assumption that only a few of these covariates in a regression setup have a meaningful correlation with an outcome variate of interest. Sparsity is incorporated by regularizing the irrelevant slopes towards zero without changing the relevant predictors and keeping the resulting inferences intact. Frequentist variable selection and sparsity are addressed by popular techniques like Lasso, Elastic Net. Bayesian penalized regression can tackle the curse of …


Preparing For The Future: The Effects Of Financial Literacy On Financial Planning For Young Professionals, Tanay Singh Apr 2020

Preparing For The Future: The Effects Of Financial Literacy On Financial Planning For Young Professionals, Tanay Singh

Senior Theses

Purpose – Many people between the age of 20 and 34 have not considered planning financially for the future in any significant capacity and in doing so, they limit their potential savings. The purpose of this study is to examine what financial expectations are for people in the early stages of their career and determine if improving financial literacy and revealing financial realities helps to produce more accurate or realistic expectations. Ultimately, the goal is to better prepare participants in the study for the working world and increased responsibilities outside of the college/university environment by getting them to start thinking …


Using Alteryx Designer In Audit, Nolan Asiala Apr 2020

Using Alteryx Designer In Audit, Nolan Asiala

Honors Projects

My senior project was built around data analysis and how it relates to the auditing profession. Initially, I was planning on attending a data analytics competition, but that was canceled due to the events of COVID-19. This project utilized the Alteryx Designer program to demonstrate how it can be used during an audit engagement. By creating a workflow in Alteryx Designer, a report from a client can be cleaned and reformatted into a working dataset. My project includes two Excel files, a Microsoft Word document that serves as a brief introduction to the program, and a video describing the workflow …


Evaluation Of Text Mining Techniques Using Twitter Data For Hurricane Disaster Resilience, Joshua Eason, Sathish Kumar Feb 2020

Evaluation Of Text Mining Techniques Using Twitter Data For Hurricane Disaster Resilience, Joshua Eason, Sathish Kumar

SDSU Data Science Symposium

Data obtained from social media microblogging websites such as Twitter provide the unique ability to collect and analyze conversations of the public in order to gain perspective on the thoughts and feelings of the general public. Sentiment and volume analysis techniques were applied to the dataset in order to gain an understanding of the amount and level of sentiment associated with certain disaster-related tweets, including a topical analysis of specific terms. This study showed that disaster-type events such as a hurricane can cause some strong negative sentiment in the period of time directly preceding the event, but ultimately returns quickly …


Allocative Poisson Factorization For Computational Social Science, Aaron Schein Jul 2019

Allocative Poisson Factorization For Computational Social Science, Aaron Schein

Doctoral Dissertations

Social science data often comes in the form of high-dimensional discrete data such as categorical survey responses, social interaction records, or text. These data sets exhibit high degrees of sparsity, missingness, overdispersion, and burstiness, all of which present challenges to traditional statistical modeling techniques. The framework of Poisson factorization (PF) has emerged in recent years as a natural way to model high-dimensional discrete data sets. This framework assumes that each observed count in a data set is a Poisson random variable $y ~ Pois(\mu)$ whose rate parameter $\mu$ is a function of shared model parameters. This thesis examines a specific …


Field Drilling Data Cleaning And Preparation For Data Analytics Applications, Daniel Cardoso Braga Jun 2019

Field Drilling Data Cleaning And Preparation For Data Analytics Applications, Daniel Cardoso Braga

LSU Master's Theses

Throughout the history of oil well drilling, service providers have been continuously striving to improve performance and reduce total drilling costs to operating companies. Despite constant improvement in tools, products, and processes, data science has not played a large part in oil well drilling. With the implementation of data science in the energy sector, companies have come to see significant value in efficiently processing the massive amounts of data produced by the multitude of internet of thing (IOT) sensors at the rig. The scope of this project is to combine academia and industry experience to analyze data from 13 different …