Open Access. Powered by Scholars. Published by Universities.®

Computer Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 18 of 18

Full-Text Articles in Computer Engineering

Learning Relation Prototype From Unlabeled Texts For Long-Tail Relation Extraction, Yixin Cao, Jun Kuang, Ming Gao, Aoying Zhou, Yonggang Wen, Tat-Seng Chua Feb 2023

Learning Relation Prototype From Unlabeled Texts For Long-Tail Relation Extraction, Yixin Cao, Jun Kuang, Ming Gao, Aoying Zhou, Yonggang Wen, Tat-Seng Chua

Research Collection School Of Computing and Information Systems

Relation Extraction (RE) is a vital step to complete Knowledge Graph (KG) by extracting entity relations from texts. However, it usually suffers from the long-tail issue. The training data mainly concentrates on a few types of relations, leading to the lack of sufficient annotations for the remaining types of relations. In this paper, we propose a general approach to learn relation prototypes from unlabeled texts, to facilitate the long-tail relation extraction by transferring knowledge from the relation types with sufficient training data. We learn relation prototypes as an implicit factor between entities, which reflects the meanings of relations as well …


Mitigating Popularity Bias In Recommendation With Unbalanced Interactions: A Gradient Perspective, Weijieying Ren, Lei Wang, Kunpeng Liu, Ruocheng Guo, Ee-Peng Lim, Yanjie Fu Dec 2022

Mitigating Popularity Bias In Recommendation With Unbalanced Interactions: A Gradient Perspective, Weijieying Ren, Lei Wang, Kunpeng Liu, Ruocheng Guo, Ee-Peng Lim, Yanjie Fu

Research Collection School Of Computing and Information Systems

Recommender systems learn from historical user-item interactions to identify preferred items for target users. These observed interactions are usually unbalanced following a long-tailed distribution. Such long-tailed data lead to popularity bias to recommend popular but not personalized items to users. We present a gradient perspective to understand two negative impacts of popularity bias in recommendation model optimization: (i) the gradient direction of popular item embeddings is closer to that of positive interactions, and (ii) the magnitude of positive gradient for popular items are much greater than that of unpopular items. To address these issues, we propose a simple yet efficient …


Messiness: Automating Iot Data Streaming Spatial Analysis, Christopher White, Atilio Barreda Ii Dec 2021

Messiness: Automating Iot Data Streaming Spatial Analysis, Christopher White, Atilio Barreda Ii

Publications and Research

The spaces we live in go through many transformations over the course of a year, a month, or a day; My room has seen tremendous clutter and pristine order within the span of a few hours. My goal is to discover patterns within my space and formulate an understanding of the changes that occur. This insight will provide actionable direction for maintaining a cleaner environment, as well as provide some information about the optimal times for productivity and energy preservation.

Using a Raspberry Pi, I will set up automated image capture in a room in my home. These images will …


Data Mining Of Unstructured Textual Information In Transportation Safety Domain: Exploring Methods, Opportunities And Limitations, Keneth Morgan Kwayu Jun 2021

Data Mining Of Unstructured Textual Information In Transportation Safety Domain: Exploring Methods, Opportunities And Limitations, Keneth Morgan Kwayu

Dissertations

The unprecedented increase in volume and influx of structured and unstructured data has overwhelmed conventional data management system capabilities in organizing, analyzing, and procuring useful information in a timely fashion. Structured data sources have a pre-defined pattern that makes data preprocessing and information retrieval tasks relatively easy for the current technologies that have been designed to handle structured and repeatable data. Unlike structured data, unstructured data usually exists in an unorganized format that offers no or little insight unless indexed and stored in an organized fashion. The inherent format of unstructured data exacerbates difficulties in data preprocessing and information extraction. …


A Direct Data-Cluster Analysis Method Based On Neutrosophic Set Implication, Florentin Smarandache, Sudan Jha, Gyanendra Prasad Joshi, Lewis Nkenyereya, Dae Wan Kim Jan 2020

A Direct Data-Cluster Analysis Method Based On Neutrosophic Set Implication, Florentin Smarandache, Sudan Jha, Gyanendra Prasad Joshi, Lewis Nkenyereya, Dae Wan Kim

Branch Mathematics and Statistics Faculty and Staff Publications

Raw data are classified using clustering techniques in a reasonable manner to create disjoint clusters. A lot of clustering algorithms based on specific parameters have been proposed to access a high volume of datasets. This paper focuses on cluster analysis based on neutrosophic set implication, i.e., a k-means algorithm with a threshold-based clustering technique. This algorithm addresses the shortcomings of the k-means clustering algorithm by overcoming the limitations of the threshold-based clustering algorithm. To evaluate the validity of the proposed method, several validity measures and validity indices are applied to the Iris dataset (from the University of California, Irvine, Machine …


On I/O Performance And Cost Efficiency Of Cloud Storage: A Client's Perspective, Binbing Hou Nov 2019

On I/O Performance And Cost Efficiency Of Cloud Storage: A Client's Perspective, Binbing Hou

LSU Doctoral Dissertations

Cloud storage has gained increasing popularity in the past few years. In cloud storage, data are stored in the service provider’s data centers; users access data via the network and pay the fees based on the service usage. For such a new storage model, our prior wisdom and optimization schemes on conventional storage may not remain valid nor applicable to the emerging cloud storage.

In this dissertation, we focus on understanding and optimizing the I/O performance and cost efficiency of cloud storage from a client’s perspective. We first conduct a comprehensive study to gain insight into the I/O performance behaviors …


Mining Capstone Project Wikis For Knowledge Discovery, Swapna Gottipati, Venky Shankararaman, Melvrivk Goh Jul 2017

Mining Capstone Project Wikis For Knowledge Discovery, Swapna Gottipati, Venky Shankararaman, Melvrivk Goh

Research Collection School Of Computing and Information Systems

Wikis are widely used collaborative environments as sources of information and knowledge. The facilitate students to engage in collaboration and share information among members and enable collaborative learning. In particular, Wikis play an important role in capstone projects. Wikis aid in various project related tasks and aid to organize information and share. Mining project Wikis is critical to understand the students learning and latest trends in industry. Mining Wikis is useful to educationists and academicians for decision-making about how to modify the educational environment to improve student's learning. The main challenge is that the content or data in project Wikis …


Prediction And Recommendations On The It Leaners' Learning Path As A Collective Intelligence Using A Data Mining Technique, Seong-Yong Hong, Juyun Cho, Yonghyun Hwang Oct 2016

Prediction And Recommendations On The It Leaners' Learning Path As A Collective Intelligence Using A Data Mining Technique, Seong-Yong Hong, Juyun Cho, Yonghyun Hwang

Journal of International Technology and Information Management

With the recent advances in computer technology along with pervasive internet accesses, data analytics is getting more attention than ever before. In addition, research areas on data analysis are diverging and integrating lots of different fields such as a business and social sector. Especially, recent researches focus on the data analysis for a better intelligent decision making and prediction system. This paper analyzes data collected from current IT learners who have already studied various IT subjects to find the IT learners’ learning patterns. The most popular learning patterns are identified through an association rule data mining using an arules package …


Understanding The Relationships Between Sanitation And Health In Nicaragua And Honduras, Through Data Mining Tools, Ginevra Marina Lazerini, Josep Nualart, Sergio Ruiz-Cayuela, Maialen Urbina, Miquel Sànchez Marrè, Karina Gibert Jul 2016

Understanding The Relationships Between Sanitation And Health In Nicaragua And Honduras, Through Data Mining Tools, Ginevra Marina Lazerini, Josep Nualart, Sergio Ruiz-Cayuela, Maialen Urbina, Miquel Sànchez Marrè, Karina Gibert

International Congress on Environmental Modelling and Software

The aim of this work is to analyze water and sanitation supply data from Nicaragua and Honduras by using different data mining tools. The data has been provided by SIASAR (Rural Water and Sanitation Information System), which is a water and sanitation management and information platform created through the joint effort of different Central American Governments and the World Bank. In the study data from a survey performed in all the rural communities in Nicaragua and in a sample of the rural communities in Honduras from 2012 to 2015 is analyzed. Database contains 10206 communities described by 23 numerical variables …


Learning On The Relationships Between Respiratory Desease And The Use Of Traditional Stoves In Bangladesh Households, Camila Vergara, Iñigo Arregui, Alain Balaguer, Tamia Gómez, Carmen Sandoval, Miquel Sànchez Marrè, Karina Gibert Jul 2016

Learning On The Relationships Between Respiratory Desease And The Use Of Traditional Stoves In Bangladesh Households, Camila Vergara, Iñigo Arregui, Alain Balaguer, Tamia Gómez, Carmen Sandoval, Miquel Sànchez Marrè, Karina Gibert

International Congress on Environmental Modelling and Software

More than 4 million people die prematurely every year by deseases related to indoor air pollution produced by solid fuels used in cooking (WHO, 2016, Jones 1999), fifty thousand of them in Bangladesh (News Medical, 2012), being women and children the most affected. Risk of pneumonia is high due to the irritants, toxins and carcinogens realeased into air by the incomplete combustion of solid fuels (biomass) used in traditional stoves (WHO 2016), which produce PM10 (particulate matter, small enough (≤10μm) to get into lungs). An open data base from the World Bank (WHO, 2016) (Dasgutpa et al 2006) describing a …


Socio Environmental Conflicts In Ecuador. The Use Of Preprocessing And Data Mining To Detect Influencing Factors On Violence And Crisis (1985 - 2016), Lina Pita Merino, Martí Rosas-Casals, Karina Gibert Jul 2016

Socio Environmental Conflicts In Ecuador. The Use Of Preprocessing And Data Mining To Detect Influencing Factors On Violence And Crisis (1985 - 2016), Lina Pita Merino, Martí Rosas-Casals, Karina Gibert

International Congress on Environmental Modelling and Software

The main concern regarding the spread of Socio Environmental Conflicts (SEC) is the constant increase of extractive activities to support the economic system. Conflicts originated in the clash of interests between the extractive industries and local populations is the more visible outcome, but the complexity of this phenomenon may not be that obvious. Among South American countries, the highest murder rates of environmental activists corresponded to Brazil, Peru and Colombia, three of the four Amazonian countries along with Ecuador (Global Witness, 2015). In addition, all of them have similar characteristics such as high levels of inequality and the presence of …


Evaluation Of Classification And Ensemble Algorithms For Bank Customer Marketing Response Prediction, Olatunji Apampa Jan 2016

Evaluation Of Classification And Ensemble Algorithms For Bank Customer Marketing Response Prediction, Olatunji Apampa

Journal of International Technology and Information Management

This article attempts to improve the performance of classification algorithms used in the bank customer marketing response prediction of an unnamed Portuguese bank using the Random Forest ensemble. A thorough exploratory data analysis (EDA) was conducted on the data in order to ascertain the presence of anomalies such as outliers and extreme values. The EDA revealed that the bank data had 45, 211 instances and 17 features, with 11.7% positive responses. This was in addition to the detection of outliers and extreme values. Classification algorithms used for modelling the bank dataset include; Logistic Regression, Decision Tree, Naïve Bayes and the …


Where Are We In Wastewater Treatment Plants Data Management? A Review And A Proposal, Manel Poch, Joaquim Comas, José Porro, Manel Garrido-Baserba, Lluis Corominas, Maite Pijuan Jun 2014

Where Are We In Wastewater Treatment Plants Data Management? A Review And A Proposal, Manel Poch, Joaquim Comas, José Porro, Manel Garrido-Baserba, Lluis Corominas, Maite Pijuan

International Congress on Environmental Modelling and Software

Wastewater treatment plants (WWTP) are comprised of complex processes that need to be optimally managed. To attain that, in the last years an impressive effort has been made to incorporate monitoring devices able to provide from several hundred to more than ten thousand signals. With the aim to take benefit of those data, different data mining techniques have been applied to transform them into information and knowledge in order to help WWTP's managers. Furthermore, several mathematical models have been developed intending to simulate process behaviour including biomass and pollutants transformation. However, it is recognized that this it is not enough …


Pre-Processing Techniques Applied To Automatic Taxon Identification On Fish Otoliths, Ramon Reig-Bolaño, Pere Marti-Puig Jun 2014

Pre-Processing Techniques Applied To Automatic Taxon Identification On Fish Otoliths, Ramon Reig-Bolaño, Pere Marti-Puig

International Congress on Environmental Modelling and Software

This paper analyzes the characteristics of a rotation-invariant Feature space to be used in a classifier of fish otoliths, it is compared to two other Feature spaces, one with raw data and another with transformed data (using the Elliptic Fourier Descriptors EFD). Otoliths are found in the inner ear of fish. Their shape can be analyzed to determine sex, age, populations and species, and thus they can provide necessary and relevant information for ecological studies. The Automatic Taxon Identifier (ATI) is used to classify fish otoliths directly from a query image and is implemented on-line in a Public Database. This …


Predicting Sql Injection And Cross Site Scripting Vulnerabilities Through Mining Input Sanitization Patterns, Lwin Khin Shar, Hee Beng Kuan Tan Apr 2013

Predicting Sql Injection And Cross Site Scripting Vulnerabilities Through Mining Input Sanitization Patterns, Lwin Khin Shar, Hee Beng Kuan Tan

Research Collection School Of Computing and Information Systems

ContextSQL injection (SQLI) and cross site scripting (XSS) are the two most common and serious web application vulnerabilities for the past decade. To mitigate these two security threats, many vulnerability detection approaches based on static and dynamic taint analysis techniques have been proposed. Alternatively, there are also vulnerability prediction approaches based on machine learning techniques, which showed that static code attributes such as code complexity measures are cheap and useful predictors. However, current prediction approaches target general vulnerabilities. And most of these approaches locate vulnerable code only at software component or file levels. Some approaches also involve process attributes that …


An Efficient Algorithm To Solve High-Dimensional Data Clustering: Candidate Subspace Clustering Algorithm, Chin-Chieh Kao Jan 2013

An Efficient Algorithm To Solve High-Dimensional Data Clustering: Candidate Subspace Clustering Algorithm, Chin-Chieh Kao

Theses Digitization Project

For this project, a comprehensive literature review on high dimensional data clustering is conducted and a novel density-algorithm to perform high dimensional data clustering is developed.


Measuring Merci: Exploring Data Mining Techniques For Examining Surgical Outcomes Of Stroke Patients, Matthew Ronald Mcnabb Aug 2012

Measuring Merci: Exploring Data Mining Techniques For Examining Surgical Outcomes Of Stroke Patients, Matthew Ronald Mcnabb

Masters Theses and Doctoral Dissertations

Mechanical Embolus Removal in Cerebral Ischemia (MERCI) has been supported by medical trials as an improved method of treating ischemic stroke past the safe window of time for administering clot-busting drugs, and was released for medical use in 2004. The importance of analyzing real-world data collected from MERCI clinical trials is key to providing insights on the effectiveness of MERCI. Most of the existing data analysis on MERCI results has thus far employed conventional statistical analysis techniques. To the best of the knowledge acquired in preliminary research, advanced data analytics and data mining techniques have not yet been systematically applied. …


An Interactive Visualization Model For Analyzing Data Storage System Workloads, Steven Charubhat Pungdumri Mar 2012

An Interactive Visualization Model For Analyzing Data Storage System Workloads, Steven Charubhat Pungdumri

Master's Theses

The performance of hard disks has become increasingly important as the volume of data storage increases. At the bottom level of large-scale storage networks is the hard disk. Despite the importance of hard drives in a storage network, it is often difficult to analyze the performance of hard disks due to the sheer size of the datasets seen by hard disks. Additionally, hard drive workloads can have several multi-dimensional characteristics, such as access time, queue depth and block-address space. The result is that hard drive workloads are extremely diverse and large, making extracting meaningful information from hard drive workloads very …