Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 12 of 12

Full-Text Articles in Computer Sciences

Mitigating Popularity Bias In Recommendation With Unbalanced Interactions: A Gradient Perspective, Weijieying Ren, Lei Wang, Kunpeng Liu, Ruocheng Guo, Ee-Peng Lim, Yanjie Fu Dec 2022

Mitigating Popularity Bias In Recommendation With Unbalanced Interactions: A Gradient Perspective, Weijieying Ren, Lei Wang, Kunpeng Liu, Ruocheng Guo, Ee-Peng Lim, Yanjie Fu

Research Collection School Of Computing and Information Systems

Recommender systems learn from historical user-item interactions to identify preferred items for target users. These observed interactions are usually unbalanced following a long-tailed distribution. Such long-tailed data lead to popularity bias to recommend popular but not personalized items to users. We present a gradient perspective to understand two negative impacts of popularity bias in recommendation model optimization: (i) the gradient direction of popular item embeddings is closer to that of positive interactions, and (ii) the magnitude of positive gradient for popular items are much greater than that of unpopular items. To address these issues, we propose a simple yet efficient …


Hybrid Feature Selection Based On Principal Component Analysis And Grey Wolf Optimizer Algorithm For Arabic News Article Classification, Osama Ahmad Alomari, Ashraf Elnagar, Imad Afyouni, Ismail Shahin, Ali Bou Nassif, Ibrahim Abaker Hashem, Mohammad Tubishat Nov 2022

Hybrid Feature Selection Based On Principal Component Analysis And Grey Wolf Optimizer Algorithm For Arabic News Article Classification, Osama Ahmad Alomari, Ashraf Elnagar, Imad Afyouni, Ismail Shahin, Ali Bou Nassif, Ibrahim Abaker Hashem, Mohammad Tubishat

All Works

The rapid growth of electronic documents has resulted from the expansion and development of internet technologies. Text-documents classification is a key task in natural language processing that converts unstructured data into structured form and then extract knowledge from it. This conversion generates a high dimensional data that needs further analusis using data mining techniques like feature extraction, feature selection, and classification to derive meaningful insights from the data. Feature selection is a technique used for reducing dimensionality in order to prune the feature space and, as a result, lowering the computational cost and enhancing classification accuracy. This work presents a …


An Empirical Study Of Blockchain System Vulnerabilities: Modules, Types, And Patterns, Xiao Yi, Daoyuan Wu, Lingxiao Jiang, Yuzhou Fang, Kehuan Zhang, Wei Zhang Nov 2022

An Empirical Study Of Blockchain System Vulnerabilities: Modules, Types, And Patterns, Xiao Yi, Daoyuan Wu, Lingxiao Jiang, Yuzhou Fang, Kehuan Zhang, Wei Zhang

Research Collection School Of Computing and Information Systems

Blockchain, as a distributed ledger technology, becomes increasingly popular, especially for enabling valuable cryptocurrencies and smart contracts. However, the blockchain software systems inevitably have many bugs. Although bugs in smart contracts have been extensively investigated, security bugs of the underlying blockchain systems are much less explored. In this paper, we conduct an empirical study on blockchain’s system vulnerabilities from four representative blockchains, Bitcoin, Ethereum, Monero, and Stellar. Specifically, we first design a systematic filtering process to effectively identify 1,037 vulnerabilities and their 2,317 patches from 34,245 issues/PRs (pull requests) and 85,164 commits on GitHub. We thus build the first blockchain …


Exploiting Reuse For Gpu Subgraph Enumeration, Wentiao Guo, Yuchen Li, Kian-Lee Tan Sep 2022

Exploiting Reuse For Gpu Subgraph Enumeration, Wentiao Guo, Yuchen Li, Kian-Lee Tan

Research Collection School Of Computing and Information Systems

Subgraph enumeration is important for many applications such as network motif discovery, community detection, and frequent subgraph mining. To accelerate the execution, recent works utilize graphics processing units (GPUs) to parallelize subgraph enumeration. The performances of these parallel schemes are dominated by the set intersection operations which account for up to $95\%$ of the total processing time. (Un)surprisingly, a significant portion (as high as $99\%$) of these operations is actually redundant, i.e., the same set of vertices is repeatedly encountered and evaluated. Therefore, in this paper, we seek to salvage and recycle the results of such operations to avoid repeated …


Using Deep Learning To Detect Social Media ‘Trolls’, Áine Macdermott, Michal Motylinski, Farkhund Iqbal, Kellyann Stamp, Mohammed Hussain, Andrew Marrington Sep 2022

Using Deep Learning To Detect Social Media ‘Trolls’, Áine Macdermott, Michal Motylinski, Farkhund Iqbal, Kellyann Stamp, Mohammed Hussain, Andrew Marrington

All Works

Detecting criminal activity online is not a new concept but how it can occur is changing. Technology and the influx of social media applications and platforms has a vital part to play in this changing landscape. As such, we observe an increasing problem with cyber abuse and ‘trolling’/toxicity amongst social media platforms sharing stories, posts, memes sharing content. In this paper we present our work into the application of deep learning techniques for the detection of ‘trolls’ and toxic content shared on social media platforms. We propose a machine learning solution for the detection of toxic images based on embedded …


Generative Methods, Meta-Learning, And Meta-Heuristics For Robust Cyber Defense, Marc W. Chale Sep 2022

Generative Methods, Meta-Learning, And Meta-Heuristics For Robust Cyber Defense, Marc W. Chale

Theses and Dissertations

Cyberspace is the digital communications network that supports the internet of battlefield things (IoBT), the model by which defense-centric sensors, computers, actuators and humans are digitally connected. A secure IoBT infrastructure facilitates real time implementation of the observe, orient, decide, act (OODA) loop across distributed subsystems. Successful hacking efforts by cyber criminals and strategic adversaries suggest that cyber systems such as the IoBT are not secure. Three lines of effort demonstrate a path towards a more robust IoBT. First, a baseline data set of enterprise cyber network traffic was collected and modelled with generative methods allowing the generation of realistic, …


Design And Analysis Of Strategic Behavior In Networks, Sixie Yu Aug 2022

Design And Analysis Of Strategic Behavior In Networks, Sixie Yu

McKelvey School of Engineering Theses & Dissertations

Networks permeate every aspect of our social and professional life.A networked system with strategic individuals can represent a variety of real-world scenarios with socioeconomic origins. In such a system, the individuals' utilities are interdependent---one individual's decision influences the decisions of others and vice versa. In order to gain insights into the system, the highly complicated interactions necessitate some level of abstraction. To capture the otherwise complex interactions, I use a game theoretic model called Networked Public Goods (NPG) game. I develop a computational framework based on NPGs to understand strategic individuals' behavior in networked systems. The framework consists of three …


Solving The Challenges Of Concept Drift In Data Stream Classification., Hanqing Hu Aug 2022

Solving The Challenges Of Concept Drift In Data Stream Classification., Hanqing Hu

Electronic Theses and Dissertations

The rise of network connected devices and applications leads to a significant increase in the volume of data that are continuously generated overtime time, called data streams. In real world applications, storing the entirety of a data stream for analyzing later is often not practical, due to the data stream’s potentially infinite volume. Data stream mining techniques and frameworks are therefore created to analyze streaming data as they arrive. However, compared to traditional data mining techniques, challenges unique to data stream mining also emerge, due to the high arrival rate of data streams and their dynamic nature. In this dissertation, …


Design Demand Trend Acquisition Method Based On Short Text Mining Of User Comments In Shopping Websites, Zhiyong Xiong, Zhaoxiong Yan, Huanan Yao, Shangsong Liang Feb 2022

Design Demand Trend Acquisition Method Based On Short Text Mining Of User Comments In Shopping Websites, Zhiyong Xiong, Zhaoxiong Yan, Huanan Yao, Shangsong Liang

Machine Learning Faculty Publications

In order to facilitate designers to explore the market demand trend of laptops and to establish a better “network users-market feedback mechanism”, we propose a design and research method of a short text mining tool based on the K-means clustering algorithm and Kano mode. An improved short text clustering algorithm is used to extract the design elements of laptops. Based on the traditional questionnaire, we extract the user’s attention factors, score the emotional tendency, and analyze the user’s needs based on the Kano model. Then, we select 10 laptops, process them by the improved algorithm, cluster the evaluation words and …


Subomiembed: Self-Supervised Representation Learning Of Multi-Omics Data For Cancer Type Classification, Sayed Hashim, Muhammad Ali, Karthik Nandakumar, Mohammad Yaqub Feb 2022

Subomiembed: Self-Supervised Representation Learning Of Multi-Omics Data For Cancer Type Classification, Sayed Hashim, Muhammad Ali, Karthik Nandakumar, Mohammad Yaqub

Computer Vision Faculty Publications

For personalized medicines, very crucial intrinsic information is present in high dimensional omics data which is difficult to capture due to the large number of molecular features and small number of available samples. Different types of omics data show various aspects of samples. Integration and analysis of multi-omics data give us a broad view of tumours, which can improve clinical decision making. Omics data, mainly DNA methylation and gene expression profiles are usually high dimensional data with a lot of molecular features. In recent years, variational autoencoders (VAE) [13] have been extensively used in embedding image and text data into …


Data Science Applied To Discover Ancient Minoan-Indus Valley Trade Routes Implied By Commonweight Measures, Peter Revesz Jan 2022

Data Science Applied To Discover Ancient Minoan-Indus Valley Trade Routes Implied By Commonweight Measures, Peter Revesz

CSE Conference and Workshop Papers

This paper applies data mining of weight measures to discover possible long-distance trade routes among Bronze Age civilizations from the Mediterranean area to India. As a result, a new northern route via the Black Sea is discovered between the Minoan and the Indus Valley civilizations. This discovery enhances the growing set of evidence for a strong and vibrant connection among Bronze Age civilizations.


Impact Of Sleep And Training On Game Performance And Injury In Division-1 Women’S Basketball Amidst The Pandemic, Samah Senbel, S. Sharma, S. M. Raval, Christopher B. Taber, Julie K. Nolan, N. S. Artan, Diala Ezzeddine, Kaya Tolga Jan 2022

Impact Of Sleep And Training On Game Performance And Injury In Division-1 Women’S Basketball Amidst The Pandemic, Samah Senbel, S. Sharma, S. M. Raval, Christopher B. Taber, Julie K. Nolan, N. S. Artan, Diala Ezzeddine, Kaya Tolga

School of Computer Science & Engineering Faculty Publications

We investigated the impact of sleep and training load of Division - 1 women’s basketball players on their game performance and injury prediction using machine learning algorithms. The data was collected during a pandemic-condensed season with unpredictable interruptions to the games and athletic training schedules. We collected data from sleep monitoring devices, training data from coaches, injury reports from medical staff, and weekly survey data from athletes for 22 weeks.With proper data imputation, interpretable feature set, data balancing, and classifiers, we showed that we could predict game performance and injuries with more than 90% accuracy. More importantly, our F1 and …