Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 11 of 11

Full-Text Articles in Physical Sciences and Mathematics

Creating Data From Unstructured Text With Context Rule Assisted Machine Learning (Craml), Stephen Meisenbacher, Peter Norlander Dec 2022

Creating Data From Unstructured Text With Context Rule Assisted Machine Learning (Craml), Stephen Meisenbacher, Peter Norlander

School of Business: Faculty Publications and Other Works

Popular approaches to building data from unstructured text come with limitations, such as scalability, interpretability, replicability, and real-world applicability. These can be overcome with Context Rule Assisted Machine Learning (CRAML), a method and no-code suite of software tools that builds structured, labeled datasets which are accurate and reproducible. CRAML enables domain experts to access uncommon constructs within a document corpus in a low-resource, transparent, and flexible manner. CRAML produces document-level datasets for quantitative research and makes qualitative classification schemes scalable over large volumes of text. We demonstrate that the method is useful for bibliographic analysis, transparent analysis of proprietary data, …


Investigating Bloom's Cognitive Skills In Foundation And Advanced Programming Courses From Students' Discussions, Joel Jer Wei Lim, Gottipati Swapna, Kyong Jin Shim Nov 2022

Investigating Bloom's Cognitive Skills In Foundation And Advanced Programming Courses From Students' Discussions, Joel Jer Wei Lim, Gottipati Swapna, Kyong Jin Shim

Research Collection School Of Computing and Information Systems

Programming courses provide students with the skills to develop complex business applications. Teaching and learning programming is challenging, and collaborative learning is proposed to help with this challenge. Online discussion forums promote networking with other learners such that they can build knowledge collaboratively. It aids students open their horizons of thought processes to acquire cognitive skills. Cognitive analysis of discussion is critical to understand students' learning process. In this paper, we propose Bloom's taxonomy based cognitive model for programming discussion forums. We present machine learning (ML) based solution to extract students' cognitive skills. Our evaluations on compupting courses show that …


Understanding Learners' Motivation Through Machine Learning Analysis On Reflection Writing, Elizabeth Pluskwik, Yuezhou Wang, Lauren Singelmann Aug 2022

Understanding Learners' Motivation Through Machine Learning Analysis On Reflection Writing, Elizabeth Pluskwik, Yuezhou Wang, Lauren Singelmann

Integrated Engineering Department Publications

Educational data mining (EDM) is an emerging interdisciplinary field that utilizes a machine learning (ML) algorithm to collect and analyze educational data, aiming to better predict students' performance and retention. In this WIP paper, we report our methodology and preliminary results from utilizing a ML program to assess students’ motivation through their upper-division years in the XYZ project-based learning (PBL) program. ML, or more specifically, the clustering algorithm, opens the door to processing large amounts of student-written artifacts, such as reflection journals, project reports, and written assignments, and then identifies keywords that signal their levels of motivation (i.e., extrinsic vs. …


Investigating Toxicity Changes Of Cross-Community Redditors From 2 Billion Posts And Comments, Hind Almerekhi, Haewoon Kwak, Bernard J. Jansen Aug 2022

Investigating Toxicity Changes Of Cross-Community Redditors From 2 Billion Posts And Comments, Hind Almerekhi, Haewoon Kwak, Bernard J. Jansen

Research Collection School Of Computing and Information Systems

This research investigates changes in online behavior of users who publish in multiple communities on Reddit by measuring their toxicity at two levels. With the aid of crowdsourcing, we built a labeled dataset of 10,083 Reddit comments, then used the dataset to train and fine-tune a Bidirectional Encoder Representations from Transformers (BERT) neural network model. The model predicted the toxicity levels of 87,376,912 posts from 577,835 users and 2,205,581,786 comments from 890,913 users on Reddit over 16 years, from 2005 to 2020. This study utilized the toxicity levels of user content to identify toxicity changes by the user within the …


Structure-Aware Visualization Retrieval, Haotian Li, Yong Wang, Aoyu Wu, Huan Wei, Huamin. Qu May 2022

Structure-Aware Visualization Retrieval, Haotian Li, Yong Wang, Aoyu Wu, Huan Wei, Huamin. Qu

Research Collection School Of Computing and Information Systems

With the wide usage of data visualizations, a huge number of Scalable Vector Graphic (SVG)-based visualizations have been created and shared online. Accordingly, there has been an increasing interest in exploring how to retrieve perceptually similar visualizations from a large corpus, since it can benefit various downstream applications such as visualization recommendation. Existing methods mainly focus on the visual appearance of visualizations by regarding them as bitmap images. However, the structural information intrinsically existing in SVG-based visualizations is ignored. Such structural information can delineate the spatial and hierarchical relationship among visual elements, and characterize visualizations thoroughly from a new perspective. …


Automated Identification Of Libraries From Vulnerability Data: Can We Do Better?, Stefanus A. Haryono, Hong Jin Kang, Abhishek Sharma, Asankhaya Sharma, Andrew E. Santosa, Ming Yi Ang, David Lo May 2022

Automated Identification Of Libraries From Vulnerability Data: Can We Do Better?, Stefanus A. Haryono, Hong Jin Kang, Abhishek Sharma, Asankhaya Sharma, Andrew E. Santosa, Ming Yi Ang, David Lo

Research Collection School Of Computing and Information Systems

Software engineers depend heavily on software libraries and have to update their dependencies once vulnerabilities are found in them. Software Composition Analysis (SCA) helps developers identify vulnerable libraries used by an application. A key challenge is the identification of libraries related to a given reported vulnerability in the National Vulnerability Database (NVD), which may not explicitly indicate the affected libraries. Recently, researchers have tried to address the problem of identifying the libraries from an NVD report by treating it as an extreme multi-label learning (XML) problem, characterized by its large number of possible labels and severe data sparsity. As input, …


A Predictive Model To Predict Cyberattack Using Self-Normalizing Neural Networks, Oluwapelumi Eniodunmo Jan 2022

A Predictive Model To Predict Cyberattack Using Self-Normalizing Neural Networks, Oluwapelumi Eniodunmo

Theses, Dissertations and Capstones

Cyberattack is a never-ending war that has greatly threatened secured information systems. The development of automated and intelligent systems provides more computing power to hackers to steal information, destroy data or system resources, and has raised global security issues. Statistical and Data mining tools have received continuous research and improvements. These tools have been adopted to create sophisticated intrusion detection systems that help information systems mitigate and defend against cyberattacks. However, the advancement in technology and accessibility of information makes more identifiable elements that can be used to gain unauthorized access to systems and resources. Data mining and classification tools …


Predictive Models In Software Engineering: Challenges And Opportunities, Yanming Yang, Xin Xia, David Lo, Tingting Bi, John C. Grundy, Xiaohu Yang Jan 2022

Predictive Models In Software Engineering: Challenges And Opportunities, Yanming Yang, Xin Xia, David Lo, Tingting Bi, John C. Grundy, Xiaohu Yang

Research Collection School Of Computing and Information Systems

Predictive models are one of the most important techniques that are widely applied in many areas of software engineering. There have been a large number of primary studies that apply predictive models and that present well-performed studies in various research domains, including software requirements, software design and development, testing and debugging, and software maintenance. This article is a first attempt to systematically organize knowledge in this area by surveying a body of 421 papers on predictive models published between 2009 and 2020. We describe the key models and approaches used, classify the different models, summarize the range of key application …


Predicting League Of Legends Ranked Games Outcome, Ngoc Linh Chi Nguyen Jan 2022

Predicting League Of Legends Ranked Games Outcome, Ngoc Linh Chi Nguyen

Senior Projects Spring 2022

League of Legends (LoL) is the one of most popular multiplayer online battle arena (MOBA) games in the world. For LoL, the most competitive way to evaluate a player’s skill level, below the professional Esports level, is competitive ranked games. These ranked games utilize a matchmaking system based on the player’s ranks to form a fair team for each game. However, a rank game's outcome cannot necessarily be predicted using just players’ ranks, there are a significant number of different variables impacting a rank game depending on how well each team plays. In this paper, I propose a method to …


Machine Learning In Requirements Elicitation: A Literature Review, Cheligeer Cheligeer, Jingwei Huang, Guosong Wu, Nadia Bhuiyan, Yuan Xu, Yong Zeng Jan 2022

Machine Learning In Requirements Elicitation: A Literature Review, Cheligeer Cheligeer, Jingwei Huang, Guosong Wu, Nadia Bhuiyan, Yuan Xu, Yong Zeng

Engineering Management & Systems Engineering Faculty Publications

A growing trend in requirements elicitation is the use of machine learning (ML) techniques to automate the cumbersome requirement handling process. This literature review summarizes and analyzes studies that incorporate ML and natural language processing (NLP) into demand elicitation. We answer the following research questions: (1) What requirement elicitation activities are supported by ML? (2) What data sources are used to build ML-based requirement solutions? (3) What technologies, algorithms, and tools are used to build ML-based requirement elicitation? (4) How to construct an ML-based requirements elicitation method? (5) What are the available tools to support ML-based requirements elicitation methodology? Keywords …


Non-Parametric Stochastic Autoencoder Model For Anomaly Detection, Raphael B. Alampay, Patricia Angela R. Abu Jan 2022

Non-Parametric Stochastic Autoencoder Model For Anomaly Detection, Raphael B. Alampay, Patricia Angela R. Abu

Department of Information Systems & Computer Science Faculty Publications

Anomaly detection is a widely studied field in computer science with applications ranging from intrusion detection, fraud detection, medical diagnosis and quality assurance in manufacturing. The underlying premise is that an anomaly is an observation that does not conform to what is considered to be normal. This study addresses two major problems in the field. First, anomalies are defined in a local context, that is, being able to give quantitative measures as to how anomalies are categorized within its own problem domain and cannot be generalized to other domains. Commonly, anomalies are measured according to statistical probabilities relative to the …