Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems Commons

Open Access. Powered by Scholars. Published by Universities.®

Theory and Algorithms

2019

Institution
Keyword
Publication
Publication Type

Articles 1 - 27 of 27

Full-Text Articles in Databases and Information Systems

Iomt Malware Detection Approaches: Analysis And Research Challenges, Mohammad Wazid, Ashok Kumar Das, Joel J.P.C. Rodrigues, Sachin Shetty, Youngho Park Dec 2019

Iomt Malware Detection Approaches: Analysis And Research Challenges, Mohammad Wazid, Ashok Kumar Das, Joel J.P.C. Rodrigues, Sachin Shetty, Youngho Park

VMASC Publications

The advancement in Information and Communications Technology (ICT) has changed the entire paradigm of computing. Because of such advancement, we have new types of computing and communication environments, for example, Internet of Things (IoT) that is a collection of smart IoT devices. The Internet of Medical Things (IoMT) is a specific type of IoT communication environment which deals with communication through the smart healthcare (medical) devices. Though IoT communication environment facilitates and supports our day-to-day activities, but at the same time it has also certain drawbacks as it suffers from several security and privacy issues, such as replay, man-in-the-middle, impersonation, …


Identifying Regional Trends In Avatar Customization, Peter Mawhorter, Sercan Sengun, Haewoon Kwak, D. Fox Harrell Dec 2019

Identifying Regional Trends In Avatar Customization, Peter Mawhorter, Sercan Sengun, Haewoon Kwak, D. Fox Harrell

Research Collection School Of Computing and Information Systems

Since virtual identities such as social media profiles and avatars have become a common venue for self-expression, it has become important to consider the ways in which existing systems embed the values of their designers. In order to design virtual identity systems that reflect the needs and preferences of diverse users, understanding how the virtual identity construction differs between groups is important. This paper presents a new methodology that leverages deep learning and differential clustering for comparative analysis of profile images, with a case study of almost 100 000 avatars from a large online community using a popular avatar creation …


Hybrid Recommender Systems Via Spectral Learning And A Random Forest, Alyssa Williams Dec 2019

Hybrid Recommender Systems Via Spectral Learning And A Random Forest, Alyssa Williams

Electronic Theses and Dissertations

We demonstrate spectral learning can be combined with a random forest classifier to produce a hybrid recommender system capable of incorporating meta information. Spectral learning is supervised learning in which data is in the form of one or more networks. Responses are predicted from features obtained from the eigenvector decomposition of matrix representations of the networks. Spectral learning is based on the highest weight eigenvectors of natural Markov chain representations. A random forest is an ensemble technique for supervised learning whose internal predictive model can be interpreted as a nearest neighbor network. A hybrid recommender can be constructed by first …


On Finding Two Posets That Cover Given Linear Orders, Ivy Ordanel, Proceso L. Fernandez Jr, Henry Adorna Oct 2019

On Finding Two Posets That Cover Given Linear Orders, Ivy Ordanel, Proceso L. Fernandez Jr, Henry Adorna

Department of Information Systems & Computer Science Faculty Publications

The Poset Cover Problem is an optimization problem where the goal is to determine a minimum set of posets that covers a given set of linear orders. This problem is relevant in the field of data mining, specifically in determining directed networks or models that explain the ordering of objects in a large sequential dataset. It is already known that the decision version of the problem is NP-Hard while its variation where the goal is to determine only a single poset that covers the input is in P. In this study, we investigate the variation, which we call the 2-Poset …


Collaborative Online Ranking Algorithms For Multitask Learning, Guangxia Li, Peilin Zhao, Tao Mei, Peng Yang, Yulong Shen, Julian K. Y. Chang, Steven C. H. Hoi Oct 2019

Collaborative Online Ranking Algorithms For Multitask Learning, Guangxia Li, Peilin Zhao, Tao Mei, Peng Yang, Yulong Shen, Julian K. Y. Chang, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

There are many applications in which it is desirable to rank or order instances that belong to several different but related problems or tasks. Although unique, the individual ranking problem often shares characteristics with other problems in the group. Conventional ranking methods treat each task independently without considering the latent commonalities. In this paper, we study the problem of learning to rank instances that belong to multiple related tasks from the multitask learning perspective. We consider a case in which the information that is learned for a task can be used to enhance the learning of other tasks and propose …


Detecting Cyberattacks In Industrial Control Systems Using Online Learning Algorithms, Guangxia Li, Yulong Shen, Peilin Zhao, Xiao Lu, Jia Liu, Yangyang Liu, Steven C. H. Hoi Oct 2019

Detecting Cyberattacks In Industrial Control Systems Using Online Learning Algorithms, Guangxia Li, Yulong Shen, Peilin Zhao, Xiao Lu, Jia Liu, Yangyang Liu, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

Industrial control systems are critical to the operation of industrial facilities, especially for critical infrastructures, such as refineries, power grids, and transportation systems. Similar to other information systems, a significant threat to industrial control systems is the attack from cyberspace-the offensive maneuvers launched by "anonymous" in the digital world that target computer-based assets with the goal of compromising a system's functions or probing for information. Owing to the importance of industrial control systems, and the possibly devastating consequences of being attacked, significant endeavors have been attempted to secure industrial control systems from cyberattacks. Among them are intrusion detection systems that …


Inspect: Iterated Local Search For Solving Path Conditions, Fuxiang Chen, Aldy Gunawan, David Lo, Sunghun Kim Aug 2019

Inspect: Iterated Local Search For Solving Path Conditions, Fuxiang Chen, Aldy Gunawan, David Lo, Sunghun Kim

Research Collection School Of Computing and Information Systems

Automated test case generation is attractive as it can reduce developer workload. To generate test cases, many Symbolic Execution approaches first produce Path Conditions (PCs), a set of constraints, and pass them to a Satisfiability Modulo Theories (SMT) solver. Despite numerous prior studies, automated test case generation by Symbolic Execution is still slow, partly due to SMT solvers’ high computationally complexity. We introduce InSPeCT, a Path Condition solver, that leverages elements of ILS (Iterated Local Search) and Tabu List. ILS is not computational intensive and focuses on generating solutions in search spaces while Tabu List prevents the use of previously …


Mathematical And Computer Simulation Of The Processes Of Two-Phase Joint Gas Filtration And Water In A Porous Environment, Elmira Nazirova Jul 2019

Mathematical And Computer Simulation Of The Processes Of Two-Phase Joint Gas Filtration And Water In A Porous Environment, Elmira Nazirova

Bulletin of TUIT: Management and Communication Technologies

A mathematical model, methods and algorithms for the numerical solution of problems of joint gas-water filtration in porous media are considered. The mathematical model of the process of non-stationary joint gas-water filtration in a porous medium is described by a system of nonlinear differential equations of parabolic type. In the numerical solution of the boundary value problem of gas displacement by water in a porous medium, the differential sweeping method is used for systems of differential-difference equations. The system of differential-difference equations with respect to the gas pressure function is nonlinear, therefore, an iterative method is used for it, based …


Early Information Access To Alleviate Emergency Department Congestion, Anjee Gorkhali Jul 2019

Early Information Access To Alleviate Emergency Department Congestion, Anjee Gorkhali

Theses and Dissertations in Business Administration

Alleviating Emergency Department (ED) congestion results in shorter hospital stay which not only reduces the cost of medical procedure but also increase the hospital performance. Length of patient stay is used to determine the hospital performance. Organization Information Processing (OIPT) Theory is used to explain the impact of information access and availability on the information processing need and ability of a hospital. Technical devices such as RFID that works as “Auto Identification tags” is suggested to increase the information availability as well as the information processing capability of the hospitals. This study suggests that the OIPT needs to be further …


Redpc: A Residual Error-Based Density Peak Clustering Algorithm, Milan Parmar, Di Wang, Xiaofeng Zhang, Ah-Hwee Tan, Chunyan Miao, You Zhou Jul 2019

Redpc: A Residual Error-Based Density Peak Clustering Algorithm, Milan Parmar, Di Wang, Xiaofeng Zhang, Ah-Hwee Tan, Chunyan Miao, You Zhou

Research Collection School Of Computing and Information Systems

The density peak clustering (DPC) algorithm was designed to identify arbitrary-shaped clusters by finding density peaks in the underlying dataset. Due to its aptitudes of relatively low computational complexity and a small number of control parameters in use, DPC soon became widely adopted. However, because DPC takes the entire data space into consideration during the computation of local density, which is then used to generate a decision graph for the identification of cluster centroids, DPC may face difficulty in differentiating overlapping clusters and in dealing with low-density data points. In this paper, we propose a residual error-based density peak clustering …


Distributed Similarity Queries In Metric Spaces, Keyu Yang, Xin Ding, Yuanliang Zhang, Lu Chen, Baihua Zheng, Yunjun Gao Jun 2019

Distributed Similarity Queries In Metric Spaces, Keyu Yang, Xin Ding, Yuanliang Zhang, Lu Chen, Baihua Zheng, Yunjun Gao

Research Collection School Of Computing and Information Systems

Similarity queries, including range queries and k nearest neighbor (kNN) queries, in metric spaces have applications in many areas such as multimedia retrieval, computational biology and location-based services. With the growing volumes of data, a distributed method is required. In this paper, we propose an Asynchronous Metric Distributed System (AMDS), to support efficient metric similarity queries in the distributed environment. AMDS uniformly partitions the data with the pivot-mapping technique to ensure the load balancing, and employs publish/subscribe communication model to asynchronous process large scale of queries. The employment of asynchronous processing model also improves robustness and efficiency of AMDS. In …


Socially-Enriched Multimedia Data Co-Clustering, Ah-Hwee Tan May 2019

Socially-Enriched Multimedia Data Co-Clustering, Ah-Hwee Tan

Research Collection School Of Computing and Information Systems

Heterogeneous data co-clustering is a commonly used technique for tapping the rich meta-information of multimedia web documents, including category, annotation, and description, for associative discovery. However, most co-clustering methods proposed for heterogeneous data do not consider the representation problem of short and noisy text and their performance is limited by the empirical weighting of the multimodal features. This chapter explains how to use the Generalized Heterogeneous Fusion Adaptive Resonance Theory (GHF-ART) generalized heterogeneous fusion adaptive resonance theory for clustering large-scale web multimedia documents. Specifically, GHF-ART is designed to handle multimedia data with an arbitrarily rich level of meta-information. For handling …


Studying And Handling Iterated Algorithmic Biases In Human And Machine Learning Interaction., Wenlong Sun May 2019

Studying And Handling Iterated Algorithmic Biases In Human And Machine Learning Interaction., Wenlong Sun

Electronic Theses and Dissertations

Algorithmic bias consists of biased predictions born from ingesting unchecked information, such as biased samples and biased labels. Furthermore, the interaction between people and algorithms can exacerbate bias such that neither the human nor the algorithms receive unbiased data. Thus, algorithmic bias can be introduced not only before and after the machine learning process but sometimes also in the middle of the learning process. With a handful of exceptions, only a few categories of bias have been studied in Machine Learning, and there are few, if any, studies of the impact of bias on both human behavior and algorithm performance. …


Building Consumer Trust In The Cloud: An Experimental Analysis Of The Cloud Trust Label Approach, Lisa Van Der Werff, Grace Fox, Ieva Masevic, Vincent C. Emeakaroha, John P. Morrison, Theo Lynn Apr 2019

Building Consumer Trust In The Cloud: An Experimental Analysis Of The Cloud Trust Label Approach, Lisa Van Der Werff, Grace Fox, Ieva Masevic, Vincent C. Emeakaroha, John P. Morrison, Theo Lynn

Department of Computer Science Publications

The lack of transparency surrounding cloud service provision makes it difficult for consumers to make knowledge based purchasing decisions. As a result, consumer trust has become a major impediment to cloud computing adoption. Cloud Trust Labels represent a means of communicating relevant service and security information to potential customers on the cloud service provided, thereby facilitating informed decision making. This research investigates the potential of a Cloud Trust Label system to overcome the trust barrier. Specifically, it examines the impact of a Cloud Trust Label on consumer perceptions of a service and cloud service provider trustworthiness and trust in the …


Question Answering With Textual Sequence Matching, Shuohang Wang Apr 2019

Question Answering With Textual Sequence Matching, Shuohang Wang

Dissertations and Theses Collection (Open Access)

Question answering (QA) is one of the most important applications in natural language processing. With the explosive text data from the Internet, intelligently getting answers of questions will help humans more efficiently collect useful information. My research in this thesis mainly focuses on solving question answering problem with textual sequence matching model which is to build vectorized representations for pairs of text sequences to enable better reasoning. And our thesis consists of three major parts.

In Part I, we propose two general models for building vectorized representations over a pair of sentences, which can be directly used to solve the …


Efficient Algorithms For Solving Aggregate Keyword Routing Problems, Qize Jiang, Weiwei Sun, Baihua Zheng, Kunjie Chen Apr 2019

Efficient Algorithms For Solving Aggregate Keyword Routing Problems, Qize Jiang, Weiwei Sun, Baihua Zheng, Kunjie Chen

Research Collection School Of Computing and Information Systems

With the emergence of smart phones and the popularity of GPS, the number of point of interest (POIs) is growing rapidly and spatial keyword search based on POIs has attracted significant attention. In this paper, we study a more sophistic type of spatial keyword searches that considers multiple query points and multiple query keywords, namely Aggregate Keyword Routing (AKR). AKR looks for an aggregate point m together with routes from each query point to m. The aggregate point has to satisfy the aggregate keywords, the routes from query points to the aggregate point have to pass POIs in order to …


Maximizing Multifaceted Network Influence, Yuchen Li, Ju Fan, George V. Ovchinnikov, Panagiotis Karras Apr 2019

Maximizing Multifaceted Network Influence, Yuchen Li, Ju Fan, George V. Ovchinnikov, Panagiotis Karras

Research Collection School Of Computing and Information Systems

An information dissemination campaign is often multifaceted, involving several facets or pieces of information disseminating from different sources. The question then arises, how should we assign such pieces to eligible sources so as to achieve the best viral dissemination results? Past research has studied the problem of Influence Maximization (IM), which is to select a set of k promoters that maximizes the expected reach of a message over a network. However, in this classical IM problem, each promoter spreads out the same unitary piece of information. In this paper, we propose the Optimal Influential Pieces Assignment (OIPA) problem, which is …


Mirai Bot Scanner Summation Prototype, Charles V. Frank Jr. Mar 2019

Mirai Bot Scanner Summation Prototype, Charles V. Frank Jr.

Masters Theses & Doctoral Dissertations

The Mirai botnet deploys a distributed mechanism with each Bot continually scanning for a potential new Bot Victim. A Bot continually generates a random IP address to scan the network for discovering a potential new Bot Victim. The Bot establishes a connection with the potential new Bot Victim with a Transmission Control Protocol (TCP) handshake. The Mirai botnet has recruited hundreds of thousands of Bots. With 100,000 Bots, Mirai Distributed Denial of Service (DDoS) attacks on service provider Dyn in October 2016 triggered the inaccessibility to hundreds of websites in Europe and North America (Sinanović & Mrdovic, 2017). A month …


Semantic And Influence Aware K-Representative Queries Over Social Streams, Yanhao Wang, Yuchen Li, Kianlee Tan Mar 2019

Semantic And Influence Aware K-Representative Queries Over Social Streams, Yanhao Wang, Yuchen Li, Kianlee Tan

Research Collection School Of Computing and Information Systems

Massive volumes of data continuously generated on social platforms have become an important information source for users. A primary method to obtain fresh and valuable information from social streams is social search. Although there have been extensive studies on social search, existing methods only focus on the relevance of query results but ignore the representativeness. In this paper, we propose a novel Semantic and Influence aware k-Representative (k-SIR) query for social streams based on topic modeling. Specifically, we consider that both user queries and elements are represented as vectors in the topic space. A k-SIR query retrieves a set of …


Dish: Democracy In State Houses, Nicholas A. Russo Feb 2019

Dish: Democracy In State Houses, Nicholas A. Russo

Master's Theses

In our current political climate, state level legislators have become increasingly impor- tant. Due to cuts in funding and growing focus at the national level, public oversight for these legislators has drastically decreased. This makes it difficult for citizens and activists to understand the relationships and commonalities between legislators. This thesis provides three contributions to address this issue. First, we created a data set containing over 1200 features focused on a legislator’s activity on bills. Second, we created embeddings that represented a legislator’s level of activity and engagement for a given bill using a custom model called Democracy2Vec. Third, we …


Send Hardest Problems My Way: Probabilistic Path Prioritization For Hybrid Fuzzing, Lei Zhao, Yue Duan, Jifeng Xuan Feb 2019

Send Hardest Problems My Way: Probabilistic Path Prioritization For Hybrid Fuzzing, Lei Zhao, Yue Duan, Jifeng Xuan

Research Collection School Of Computing and Information Systems

Hybrid fuzzing which combines fuzzing and concolic execution has become an advanced technique for software vulnerability detection. Based on the observation that fuzzing and concolic execution are complementary in nature, the state-of-the-art hybrid fuzzing systems deploy ``demand launch'' and ``optimal switch'' strategies. Although these ideas sound intriguing, we point out several fundamental limitations in them, due to oversimplified assumptions. We then propose a novel ``discriminative dispatch'' strategy to better utilize the capability of concolic execution. We design a novel Monte Carlo based probabilistic path prioritization model to quantify each path's difficulty and prioritize them for concolic execution. This model treats …


Improving Vix Futures Forecasts Using Machine Learning Methods, James Hosker, Slobodan Djurdjevic, Hieu Nguyen, Robert Slater Jan 2019

Improving Vix Futures Forecasts Using Machine Learning Methods, James Hosker, Slobodan Djurdjevic, Hieu Nguyen, Robert Slater

SMU Data Science Review

The problem of forecasting market volatility is a difficult task for most fund managers. Volatility forecasts are used for risk management, alpha (risk) trading, and the reduction of trading friction. Improving the forecasts of future market volatility assists fund managers in adding or reducing risk in their portfolios as well as in increasing hedges to protect their portfolios in anticipation of a market sell-off event. Our analysis compares three existing financial models that forecast future market volatility using the Chicago Board Options Exchange Volatility Index (VIX) to six machine/deep learning supervised regression methods. This analysis determines which models provide best …


Absorption Calculator: A Cross-Platform Application For Portable Data Analysis, Annmarie Kolbl Jan 2019

Absorption Calculator: A Cross-Platform Application For Portable Data Analysis, Annmarie Kolbl

Williams Honors College, Honors Research Projects

Traditional spectrometers are expensive and non-portable, making them inaccessible to the public. This application will be used in conjunction with spectrometer hardware developed by Erie Open Systems. The hardware itself is 3D printed and, in addition to being portable, enables data to be collected easily. The purpose of this project is to create a cross-platform application capable of reading the output from the spectrometer hardware, calculating the absorbance levels of the sample against the control, and recording the data in tables stored on the cloud. The end result will be an application that runs on iOS and Android, and is …


Learning To Map The Visual And Auditory World, Tawfiq Salem Jan 2019

Learning To Map The Visual And Auditory World, Tawfiq Salem

Theses and Dissertations--Computer Science

The appearance of the world varies dramatically not only from place to place but also from hour to hour and month to month. Billions of images that capture this complex relationship are uploaded to social-media websites every day and often are associated with precise time and location metadata. This rich source of data can be beneficial to improve our understanding of the globe. In this work, we propose a general framework that uses these publicly available images for constructing dense maps of different ground-level attributes from overhead imagery. In particular, we use well-defined probabilistic models and a weakly-supervised, multi-task training …


The Global Disinformation Order: 2019 Global Inventory Of Organised Social Media Manipulation, Samantha Bradshaw, Philip N. Howard Jan 2019

The Global Disinformation Order: 2019 Global Inventory Of Organised Social Media Manipulation, Samantha Bradshaw, Philip N. Howard

Copyright, Fair Use, Scholarly Communication, etc.

Executive Summary

Over the past three years, we have monitored the global organization of social media manipulation by governments and political parties. Our 2019 report analyses the trends of computational propaganda and the evolving tools, capacities, strategies, and resources.

1. Evidence of organized social media manipulation campaigns which have taken place in 70 countries, up from 48 countries in 2018 and 28 countries in 2017. In each country, there is at least one political party or government agency using social media to shape public attitudes domestically.

2.Social media has become co-opted by many authoritarian regimes. In 26 countries, computational propaganda …


Building Recommendation Systems, Orion Davis Jan 2019

Building Recommendation Systems, Orion Davis

Williams Honors College, Honors Research Projects

Recommendation systems are pieces of software that suggest new items to a user. There are many moving parts to these systems including data, the actual recommendation model, processing data and finally displaying data. This project explores the role each part plays in the overall system and how to develop a recommendation system for beer from scratch. This project highlights the algorithm behind the recommendations and a user facing Android application.


Large Scale Online Multiple Kernel Regression With Application To Time-Series Prediction, Doyen Sahoo, Steven C. H. Hoi, Bin Lin Jan 2019

Large Scale Online Multiple Kernel Regression With Application To Time-Series Prediction, Doyen Sahoo, Steven C. H. Hoi, Bin Lin

Research Collection School Of Computing and Information Systems

Kernel-based regression represents an important family of learning techniques for solving challenging regression tasks with non-linear patterns. Despite being studied extensively, most of the existing work suffers from two major drawbacks as follows: (i) they are often designed for solving regression tasks in a batch learning setting, making them not only computationally inefficient and but also poorly scalable in real-world applications where data arrives sequentially; and (ii) they usually assume that a fixed kernel function is given prior to the learning task, which could result in poor performance if the chosen kernel is inappropriate. To overcome these drawbacks, this work …