Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 13 of 13

Full-Text Articles in Databases and Information Systems

Machine Learning In Requirements Elicitation: A Literature Review, Cheligeer Cheligeer, Jingwei Huang, Guosong Wu, Nadia Bhuiyan, Yuan Xu, Yong Zeng Jan 2022

Machine Learning In Requirements Elicitation: A Literature Review, Cheligeer Cheligeer, Jingwei Huang, Guosong Wu, Nadia Bhuiyan, Yuan Xu, Yong Zeng

Engineering Management & Systems Engineering Faculty Publications

A growing trend in requirements elicitation is the use of machine learning (ML) techniques to automate the cumbersome requirement handling process. This literature review summarizes and analyzes studies that incorporate ML and natural language processing (NLP) into demand elicitation. We answer the following research questions: (1) What requirement elicitation activities are supported by ML? (2) What data sources are used to build ML-based requirement solutions? (3) What technologies, algorithms, and tools are used to build ML-based requirement elicitation? (4) How to construct an ML-based requirements elicitation method? (5) What are the available tools to support ML-based requirements elicitation methodology? Keywords …


Fair And Diverse Group Formation Based On Multidimensional Features, Mohammed Saad A Alqahtani Dec 2021

Fair And Diverse Group Formation Based On Multidimensional Features, Mohammed Saad A Alqahtani

Graduate Theses and Dissertations

The goal of group formation is to build a team to accomplish a specific task. Algorithms are being developed to improve the team's effectiveness so formed and the efficiency of the group selection process. However, there is concern that team formation algorithms could be biased against minorities due to the algorithms themselves or the data on which they are trained. Hence, it is essential to build fair team formation systems that incorporate demographic information into the process of building the group. Although there has been extensive work on modeling individuals’ expertise for expert recommendation and/or team formation, there has been …


Information Extraction And Classification On Journal Papers, Lei Yu Nov 2021

Information Extraction And Classification On Journal Papers, Lei Yu

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

The importance of journals for diffusing the results of scientific research has increased considerably. In the digital era, Portable Document Format (PDF) became the established format of electronic journal articles. This structured form, combined with a regular and wide dissemination, spread scientific advancements easily and quickly. However, the rapidly increasing numbers of published scientific articles requires more time and effort on systematic literature reviews, searches and screens. The comprehension and extraction of useful information from the digital documents is also a challenging task, due to the complex structure of PDF.

To help a soil science team from the United States …


Counting And Sampling Small Structures In Graph And Hypergraph Data Streams, Themistoklis Haris Jun 2021

Counting And Sampling Small Structures In Graph And Hypergraph Data Streams, Themistoklis Haris

Dartmouth College Undergraduate Theses

In this thesis, we explore the problem of approximating the number of elementary substructures called simplices in large k-uniform hypergraphs. The hypergraphs are assumed to be too large to be stored in memory, so we adopt a data stream model, where the hypergraph is defined by a sequence of hyperedges.

First we propose an algorithm that (ε, δ)-estimates the number of simplices using O(m1+1/k / T) bits of space. In addition, we prove that no constant-pass streaming algorithm can (ε, δ)- approximate the number of simplices using less than O( m 1+1/k / T ) bits of space. Thus …


The Global Disinformation Order: 2019 Global Inventory Of Organised Social Media Manipulation, Samantha Bradshaw, Philip N. Howard Jan 2019

The Global Disinformation Order: 2019 Global Inventory Of Organised Social Media Manipulation, Samantha Bradshaw, Philip N. Howard

Copyright, Fair Use, Scholarly Communication, etc.

Executive Summary

Over the past three years, we have monitored the global organization of social media manipulation by governments and political parties. Our 2019 report analyses the trends of computational propaganda and the evolving tools, capacities, strategies, and resources.

1. Evidence of organized social media manipulation campaigns which have taken place in 70 countries, up from 48 countries in 2018 and 28 countries in 2017. In each country, there is at least one political party or government agency using social media to shape public attitudes domestically.

2.Social media has become co-opted by many authoritarian regimes. In 26 countries, computational propaganda …


Accelerating Dynamic Graph Analytics On Gpus, Mo Shan, Yuchen Li, Bingsheng He, Kian-Lee Tan Aug 2017

Accelerating Dynamic Graph Analytics On Gpus, Mo Shan, Yuchen Li, Bingsheng He, Kian-Lee Tan

Research Collection School Of Computing and Information Systems

As graph analytics often involves compute-intensive operations,GPUs have been extensively used to accelerate the processing. However, in many applications such as social networks, cyber security, and fraud detection, their representative graphs evolve frequently and one has to perform are build of the graph structure on GPUs to incorporate the updates. Hence, rebuilding the graphs becomes the bottleneck of processing high-speed graph streams. In this paper,we propose a GPU-based dynamic graph storage scheme to support existing graph algorithms easily. Furthermore,we propose parallel update algorithms to support efficient stream updates so that the maintained graph is immediately available for high-speed analytic processing …


Negative Factor: Improving Regular-Expression Matching In Strings, Xiaochun Yang, Tao Qiu, Bin Wang, Baihua Zheng, Yaoshu Wang, Chen Li Feb 2016

Negative Factor: Improving Regular-Expression Matching In Strings, Xiaochun Yang, Tao Qiu, Bin Wang, Baihua Zheng, Yaoshu Wang, Chen Li

Research Collection School Of Computing and Information Systems

The problem of finding matches of a regular expression (RE) on a string exists in many applications such as text editing, biosequence search, and shell commands. Existing techniques first identify candidates using substrings in the RE, then verify each of them using an automaton. These techniques become inefficient when there are many candidate occurrences that need to be verified. In this paper we propose a novel technique that prunes false negatives by utilizing negative factors, which are substrings that cannot appear in an answer. A main advantage of the technique is that it can be integrated with many existing algorithms …


In-Degree Dynamics Of Large-Scale P2p Systems, Zhongmei Yao, Daren B. H. Cline, Dmitri Loguinov Jan 2015

In-Degree Dynamics Of Large-Scale P2p Systems, Zhongmei Yao, Daren B. H. Cline, Dmitri Loguinov

Zhongmei Yao

This paper builds a complete modeling framework for understanding user churn and in-degree dynamics in unstructured P2P systems in which each user can be viewed as a stationary alternating renewal process. While the classical Poisson result on the superposition of n stationary renewal processes for n→∞ requires that each point process become sparser as n increases, it is often difficult to rigorously show this condition in practice. In this paper, we first prove that despite user heterogeneity and non-Poisson arrival dynamics, a superposition of edge-arrival processes to a live user under uniform selection converges to a Poisson process when …


Global Immutable Region Computation, Jilian Zhang, Kyriakos Mouratidis, Hwee Hwa Pang Jun 2014

Global Immutable Region Computation, Jilian Zhang, Kyriakos Mouratidis, Hwee Hwa Pang

Research Collection School Of Computing and Information Systems

A top-k query shortlists the k records in a dataset that best match the user's preferences. To indicate her preferences, the user typically determines a numeric weight for each data dimension (i.e., attribute). We refer to these weights collectively as the query vector. Based on this vector, each data record is implicitly mapped to a score value (via a weighted sum function). The records with the k largest scores are reported as the result. In this paper we propose an auxiliary feature to standard top-k query processing. Specifically, we compute the maximal locus within which the query vector incurs no …


L-Opacity: Linkage-Aware Graph Anonymization, Sadegh Nobari, Panagiotis Karras, Hwee Hwa Pang, Stephane Bressan Mar 2014

L-Opacity: Linkage-Aware Graph Anonymization, Sadegh Nobari, Panagiotis Karras, Hwee Hwa Pang, Stephane Bressan

Research Collection School Of Computing and Information Systems

The wealth of information contained in online social networks has created a demand for the publication of such data as graphs. Yet, publication, even after identities have been removed, poses a privacy threat. Past research has suggested ways to publish graph data in a way that prevents the re-identification of nodes. However, even when identities are effectively hidden, an adversary may still be able to infer linkage between individuals with sufficiently high confidence. In this paper, we focus on the privacy threat arising from such link disclosure. We suggest L-opacity, a sufficiently strong privacy model that aims to control an …


L-Opacity: Linkage-Aware Graph Anonymization, Sadegh Nobari, Panagiotis Karras, Hwee Hwa Pang, Stephane Bressan Feb 2014

L-Opacity: Linkage-Aware Graph Anonymization, Sadegh Nobari, Panagiotis Karras, Hwee Hwa Pang, Stephane Bressan

Sadegh Nobari

The wealth of information contained in online social networks has created a demand for the publication of such data as graphs. Yet, publication, even after identities have been removed, poses a privacy threat. Past research has suggested ways to publish graph data in a way that prevents the re-identification of nodes. However, even when identities are effectively hidden, an adversary may still be able to infer linkage between individuals with sufficiently high confidence. In this paper, we focus on the privacy threat arising from such link disclosure. We suggest L-opacity, a sufficiently strong privacy model that aims to control an …


In-Degree Dynamics Of Large-Scale P2p Systems, Zhongmei Yao, Daren B. H. Cline, Dmitri Loguinov Jan 2011

In-Degree Dynamics Of Large-Scale P2p Systems, Zhongmei Yao, Daren B. H. Cline, Dmitri Loguinov

Computer Science Faculty Publications

This paper builds a complete modeling framework for understanding user churn and in-degree dynamics in unstructured P2P systems in which each user can be viewed as a stationary alternating renewal process. While the classical Poisson result on the superposition of n stationary renewal processes for n→∞ requires that each point process become sparser as n increases, it is often difficult to rigorously show this condition in practice. In this paper, we first prove that despite user heterogeneity and non-Poisson arrival dynamics, a superposition of edge-arrival processes to a live user under uniform selection converges to a Poisson process when …


A Multi-Scale Tikhonov Regularization Scheme For Implicit Surface Modeling, Jianke Zhu, Steven C. H. Hoi, Michael R. Lyu Jun 2007

A Multi-Scale Tikhonov Regularization Scheme For Implicit Surface Modeling, Jianke Zhu, Steven C. H. Hoi, Michael R. Lyu

Research Collection School Of Computing and Information Systems

Kernel machines have recently been considered as a promising solution for implicit surface modelling. A key challenge of machine learning solutions is how to fit implicit shape models from large-scale sets of point cloud samples efficiently. In this paper, we propose a fast solution for approximating implicit surfaces based on a multi-scale Tikhonov regularization scheme. The optimization of our scheme is formulated into a sparse linear equation system, which can be efficiently solved by factorization methods. Different from traditional approaches, our scheme does not employ auxiliary off-surface points, which not only saves the computational cost but also avoids the problem …