Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 40

Full-Text Articles in Physical Sciences and Mathematics

Relation Preserving Triplet Mining For Stabilising The Triplet Loss In Re-Identification Systems, Adhiraj Ghosh, Kuruparan Shanmugalingam, Wen-Yan Lin Jan 2023

Relation Preserving Triplet Mining For Stabilising The Triplet Loss In Re-Identification Systems, Adhiraj Ghosh, Kuruparan Shanmugalingam, Wen-Yan Lin

Research Collection School Of Computing and Information Systems

Object appearances change dramatically with pose variations. This creates a challenge for embedding schemes that seek to map instances with the same object ID to locations that are as close as possible. This issue becomes significantly heightened in complex computer vision tasks such as re-identification(reID). In this paper, we suggest that these dramatic appearance changes are indications that an object ID is composed of multiple natural groups, and it is counterproductive to forcefully map instances from different groups to a common location. This leads us to introduce Relation Preserving Triplet Mining (RPTM), a feature matching guided triplet mining scheme, that …


Machine Learning In Requirements Elicitation: A Literature Review, Cheligeer Cheligeer, Jingwei Huang, Guosong Wu, Nadia Bhuiyan, Yuan Xu, Yong Zeng Jan 2022

Machine Learning In Requirements Elicitation: A Literature Review, Cheligeer Cheligeer, Jingwei Huang, Guosong Wu, Nadia Bhuiyan, Yuan Xu, Yong Zeng

Engineering Management & Systems Engineering Faculty Publications

A growing trend in requirements elicitation is the use of machine learning (ML) techniques to automate the cumbersome requirement handling process. This literature review summarizes and analyzes studies that incorporate ML and natural language processing (NLP) into demand elicitation. We answer the following research questions: (1) What requirement elicitation activities are supported by ML? (2) What data sources are used to build ML-based requirement solutions? (3) What technologies, algorithms, and tools are used to build ML-based requirement elicitation? (4) How to construct an ML-based requirements elicitation method? (5) What are the available tools to support ML-based requirements elicitation methodology? Keywords …


Fair And Diverse Group Formation Based On Multidimensional Features, Mohammed Saad A Alqahtani Dec 2021

Fair And Diverse Group Formation Based On Multidimensional Features, Mohammed Saad A Alqahtani

Graduate Theses and Dissertations

The goal of group formation is to build a team to accomplish a specific task. Algorithms are being developed to improve the team's effectiveness so formed and the efficiency of the group selection process. However, there is concern that team formation algorithms could be biased against minorities due to the algorithms themselves or the data on which they are trained. Hence, it is essential to build fair team formation systems that incorporate demographic information into the process of building the group. Although there has been extensive work on modeling individuals’ expertise for expert recommendation and/or team formation, there has been …


Information Extraction And Classification On Journal Papers, Lei Yu Nov 2021

Information Extraction And Classification On Journal Papers, Lei Yu

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

The importance of journals for diffusing the results of scientific research has increased considerably. In the digital era, Portable Document Format (PDF) became the established format of electronic journal articles. This structured form, combined with a regular and wide dissemination, spread scientific advancements easily and quickly. However, the rapidly increasing numbers of published scientific articles requires more time and effort on systematic literature reviews, searches and screens. The comprehension and extraction of useful information from the digital documents is also a challenging task, due to the complex structure of PDF.

To help a soil science team from the United States …


Exploratory Search With Archetype-Based Language Models, Brent D. Davis Aug 2021

Exploratory Search With Archetype-Based Language Models, Brent D. Davis

Electronic Thesis and Dissertation Repository

This dissertation explores how machine learning, natural language processing and information retrieval may assist the exploratory search task. Exploratory search is a search where the ideal outcome of the search is unknown, and thus the ideal language to use in a retrieval query to match it is unavailable. Three algorithms represent the contribution of this work. Archetype-based Modeling and Search provides a way to use previously identified archetypal documents relevant to an archetype to form a notion of similarity and find related documents that match the defined archetype. This is beneficial for exploratory search as it can generalize beyond standard …


Counting And Sampling Small Structures In Graph And Hypergraph Data Streams, Themistoklis Haris Jun 2021

Counting And Sampling Small Structures In Graph And Hypergraph Data Streams, Themistoklis Haris

Dartmouth College Undergraduate Theses

In this thesis, we explore the problem of approximating the number of elementary substructures called simplices in large k-uniform hypergraphs. The hypergraphs are assumed to be too large to be stored in memory, so we adopt a data stream model, where the hypergraph is defined by a sequence of hyperedges.

First we propose an algorithm that (ε, δ)-estimates the number of simplices using O(m1+1/k / T) bits of space. In addition, we prove that no constant-pass streaming algorithm can (ε, δ)- approximate the number of simplices using less than O( m 1+1/k / T ) bits of space. Thus …


Changing The Focus: Worker-Centric Optimization In Human-In-The-Loop Computations, Mohammadreza Esfandiari Aug 2020

Changing The Focus: Worker-Centric Optimization In Human-In-The-Loop Computations, Mohammadreza Esfandiari

Dissertations

A myriad of emerging applications from simple to complex ones involve human cognizance in the computation loop. Using the wisdom of human workers, researchers have solved a variety of problems, termed as “micro-tasks” such as, captcha recognition, sentiment analysis, image categorization, query processing, as well as “complex tasks” that are often collaborative, such as, classifying craters on planetary surfaces, discovering new galaxies (Galaxyzoo), performing text translation. The current view of “humans-in-the-loop” tends to see humans as machines, robots, or low-level agents used or exploited in the service of broader computation goals. This dissertation is developed to shift the focus back …


The Global Disinformation Order: 2019 Global Inventory Of Organised Social Media Manipulation, Samantha Bradshaw, Philip N. Howard Jan 2019

The Global Disinformation Order: 2019 Global Inventory Of Organised Social Media Manipulation, Samantha Bradshaw, Philip N. Howard

Copyright, Fair Use, Scholarly Communication, etc.

Executive Summary

Over the past three years, we have monitored the global organization of social media manipulation by governments and political parties. Our 2019 report analyses the trends of computational propaganda and the evolving tools, capacities, strategies, and resources.

1. Evidence of organized social media manipulation campaigns which have taken place in 70 countries, up from 48 countries in 2018 and 28 countries in 2017. In each country, there is at least one political party or government agency using social media to shape public attitudes domestically.

2.Social media has become co-opted by many authoritarian regimes. In 26 countries, computational propaganda …


Equity Trading Evaluation Strategies In Switzerland After The European Mifid Ii, Linn Kristina Karstadt Jan 2018

Equity Trading Evaluation Strategies In Switzerland After The European Mifid Ii, Linn Kristina Karstadt

Walden Dissertations and Doctoral Studies

Swiss bank traders are affected by technological and regulatory challenges, which may affect their broker voting process and may result in a change of trading and evaluation behavior in 2018. Compounded challenges exist when broker evaluation strategies are not effective or Markets in Financial Instruments Directive (MiFID) II compliant. This qualitative, single case study, built on efficient capital market hypothesis and innovative disruption theory, was focused on effective broker evaluation strategies after MiFID II in Switzerland. The sample consisted of 4 buy-side traders, who shared their unique perspectives. Methodological triangulation was achieved through semistructured interviews, a review of the institution's …


Accelerating Dynamic Graph Analytics On Gpus, Mo Shan, Yuchen Li, Bingsheng He, Kian-Lee Tan Aug 2017

Accelerating Dynamic Graph Analytics On Gpus, Mo Shan, Yuchen Li, Bingsheng He, Kian-Lee Tan

Research Collection School Of Computing and Information Systems

As graph analytics often involves compute-intensive operations,GPUs have been extensively used to accelerate the processing. However, in many applications such as social networks, cyber security, and fraud detection, their representative graphs evolve frequently and one has to perform are build of the graph structure on GPUs to incorporate the updates. Hence, rebuilding the graphs becomes the bottleneck of processing high-speed graph streams. In this paper,we propose a GPU-based dynamic graph storage scheme to support existing graph algorithms easily. Furthermore,we propose parallel update algorithms to support efficient stream updates so that the maintained graph is immediately available for high-speed analytic processing …


On Effective Location-Aware Music Recommendation, Zhiyong Cheng, Jialie Shen Apr 2016

On Effective Location-Aware Music Recommendation, Zhiyong Cheng, Jialie Shen

Research Collection School Of Computing and Information Systems

Rapid advances in mobile devices and cloud-based music service now allow consumers to enjoy music any-time and anywhere. Consequently, there has been an increasing demand in studying intelligent techniques to facilitate context-aware music recommendation. However, one important context that is generally overlooked is user's venue, which often includes surrounding atmosphere, correlates with activities, and greatly influences the user's music preferences. In this article, we present a novel venue-aware music recommender system called VenueMusic to effectively identify suitable songs for various types of popular venues in our daily lives. Toward this goal, a Location-aware Topic Model (LTM) is proposed to (i) …


Negative Factor: Improving Regular-Expression Matching In Strings, Xiaochun Yang, Tao Qiu, Bin Wang, Baihua Zheng, Yaoshu Wang, Chen Li Feb 2016

Negative Factor: Improving Regular-Expression Matching In Strings, Xiaochun Yang, Tao Qiu, Bin Wang, Baihua Zheng, Yaoshu Wang, Chen Li

Research Collection School Of Computing and Information Systems

The problem of finding matches of a regular expression (RE) on a string exists in many applications such as text editing, biosequence search, and shell commands. Existing techniques first identify candidates using substrings in the RE, then verify each of them using an automaton. These techniques become inefficient when there are many candidate occurrences that need to be verified. In this paper we propose a novel technique that prunes false negatives by utilizing negative factors, which are substrings that cannot appear in an answer. A main advantage of the technique is that it can be integrated with many existing algorithms …


Answering Why-Not Questions On Reverse Top-K Queries, Yunjun Gao, Qing Liu, Gang Chen, Baihua Zheng, Linlin Zhou Sep 2015

Answering Why-Not Questions On Reverse Top-K Queries, Yunjun Gao, Qing Liu, Gang Chen, Baihua Zheng, Linlin Zhou

Research Collection School Of Computing and Information Systems

Why-not questions, which aim to seek clarifications on the missing tuples for query results, have recently received considerable attention from the database community. In this paper, we systematically explore why-not questions on reverse top-k queries, owing to its importance in multi-criteria decision making. Given an initial reverse top-k query and a missing/why-not weighting vector set Wm that is absent from the query result, why-not questions on reverse top-k queries explain why Wm does not appear in the query result and provide suggestions on how to refine the initial query with minimum penalty to include Wm in the refined query result. …


In-Degree Dynamics Of Large-Scale P2p Systems, Zhongmei Yao, Daren B. H. Cline, Dmitri Loguinov Jan 2015

In-Degree Dynamics Of Large-Scale P2p Systems, Zhongmei Yao, Daren B. H. Cline, Dmitri Loguinov

Zhongmei Yao

This paper builds a complete modeling framework for understanding user churn and in-degree dynamics in unstructured P2P systems in which each user can be viewed as a stationary alternating renewal process. While the classical Poisson result on the superposition of n stationary renewal processes for n→∞ requires that each point process become sparser as n increases, it is often difficult to rigorously show this condition in practice. In this paper, we first prove that despite user heterogeneity and non-Poisson arrival dynamics, a superposition of edge-arrival processes to a live user under uniform selection converges to a Poisson process when …


Extracting City Traffic Events From Social Streams, Pramod Anantharam, Payam Barnaghi, Krishnaprasad Thirunarayan, Amit P. Sheth Jan 2015

Extracting City Traffic Events From Social Streams, Pramod Anantharam, Payam Barnaghi, Krishnaprasad Thirunarayan, Amit P. Sheth

Kno.e.sis Publications

Cities are composed of complex systems with physical, cyber, and social components. Current works on extracting and understanding city events mainly rely on technology enabled infrastructure to observe and record events. In this work, we propose an approach to leverage citizen observations of various city systems and services such as traffic, public transport, water supply, weather, sewage, and public safety as a source of city events. We investigate the feasibility of using such textual streams for extracting city events from annotated text. We formalize the problem of annotating social streams such as microblogs as a sequence labeling problem. We present …


Global Immutable Region Computation, Jilian Zhang, Kyriakos Mouratidis, Hwee Hwa Pang Jun 2014

Global Immutable Region Computation, Jilian Zhang, Kyriakos Mouratidis, Hwee Hwa Pang

Research Collection School Of Computing and Information Systems

A top-k query shortlists the k records in a dataset that best match the user's preferences. To indicate her preferences, the user typically determines a numeric weight for each data dimension (i.e., attribute). We refer to these weights collectively as the query vector. Based on this vector, each data record is implicitly mapped to a score value (via a weighted sum function). The records with the k largest scores are reported as the result. In this paper we propose an auxiliary feature to standard top-k query processing. Specifically, we compute the maximal locus within which the query vector incurs no …


On Finding The Point Where There Is No Return: Turning Point Mining On Game Data, Wei Gong, Ee Peng Lim, Feida Zhu, Achananuparp Palakorn, David Lo Apr 2014

On Finding The Point Where There Is No Return: Turning Point Mining On Game Data, Wei Gong, Ee Peng Lim, Feida Zhu, Achananuparp Palakorn, David Lo

Research Collection School Of Computing and Information Systems

Gaming expertise is usually accumulated through playing or watching many game instances, and identifying critical moments in these game instances called turning points. Turning point rules (shorten as TPRs) are game patterns that almost always lead to some irreversible outcomes. In this paper, we formulate the notion of irreversible outcome property which can be combined with pattern mining so as to automatically extract TPRs from any given game datasets. We specifically extend the well-known PrefixSpan sequence mining algorithm by incorporating the irreversible outcome property. To show the usefulness of TPRs, we apply them to Tetris, a popular game. We mine …


Algorithmic Accountability, Tamara Kneese Mar 2014

Algorithmic Accountability, Tamara Kneese

Media Studies

Accountability is fundamentally about checks and balances to power. In theory, both government and corporations are kept accountable through social, economic, and political mechanisms. Journalism and public advocates serve as an additional tool to hold powerful institutions and individuals accountable. But in a world of data and algorithms, accountability is often murky. Beyond questions about whether the market is sufficient or governmental regulation is necessary, how should algorithms be held accountable? For example what is the role of the fourth estate in holding data-oriented practices accountable?


L-Opacity: Linkage-Aware Graph Anonymization, Sadegh Nobari, Panagiotis Karras, Hwee Hwa Pang, Stephane Bressan Mar 2014

L-Opacity: Linkage-Aware Graph Anonymization, Sadegh Nobari, Panagiotis Karras, Hwee Hwa Pang, Stephane Bressan

Research Collection School Of Computing and Information Systems

The wealth of information contained in online social networks has created a demand for the publication of such data as graphs. Yet, publication, even after identities have been removed, poses a privacy threat. Past research has suggested ways to publish graph data in a way that prevents the re-identification of nodes. However, even when identities are effectively hidden, an adversary may still be able to infer linkage between individuals with sufficiently high confidence. In this paper, we focus on the privacy threat arising from such link disclosure. We suggest L-opacity, a sufficiently strong privacy model that aims to control an …


L-Opacity: Linkage-Aware Graph Anonymization, Sadegh Nobari, Panagiotis Karras, Hwee Hwa Pang, Stephane Bressan Feb 2014

L-Opacity: Linkage-Aware Graph Anonymization, Sadegh Nobari, Panagiotis Karras, Hwee Hwa Pang, Stephane Bressan

Sadegh Nobari

The wealth of information contained in online social networks has created a demand for the publication of such data as graphs. Yet, publication, even after identities have been removed, poses a privacy threat. Past research has suggested ways to publish graph data in a way that prevents the re-identification of nodes. However, even when identities are effectively hidden, an adversary may still be able to infer linkage between individuals with sufficiently high confidence. In this paper, we focus on the privacy threat arising from such link disclosure. We suggest L-opacity, a sufficiently strong privacy model that aims to control an …


Using Micro-Reviews To Select An Efficient Set Of Reviews, Thanh-Son Nguyen, Hady W. Lauw, Panayiotis Tsaparas Nov 2013

Using Micro-Reviews To Select An Efficient Set Of Reviews, Thanh-Son Nguyen, Hady W. Lauw, Panayiotis Tsaparas

Research Collection School Of Computing and Information Systems

Online reviews are an invaluable resource for web users trying to make decisions regarding products or services. However, the abundance of review content, as well as the unstructured, lengthy, and verbose nature of reviews make it hard for users to locate the appropriate reviews, and distill the useful information. With the recent growth of social networking and micro-blogging services, we observe the emergence of a new type of online review content, consisting of bite-sized, 140 character-long reviews often posted reactively on the spot via mobile devices. These micro-reviews are short, concise, and focused, nicely complementing the lengthy, elaborate, and verbose …


Roundtriprank: Graph-Based Proximity With Importance And Specificity, Yuan Fang, Kevin Chen-Chuan Chang, Hady W. Lauw Apr 2013

Roundtriprank: Graph-Based Proximity With Importance And Specificity, Yuan Fang, Kevin Chen-Chuan Chang, Hady W. Lauw

Research Collection School Of Computing and Information Systems

Graph-based proximity has many applications with different ranking needs. However, most previous works only stress the sense of importance by finding "popular” results for a query. Often times important results are overly general without being well-tailored to the query, lacking a sense of specificity— which only emerges recently. Even then, the two senses are treated independently, and only combined empirically. In this paper, we generalize the well-studied importance-based random walk into a round trip and develop RoundTripRank, seamlessly integrating specificity and importance in one coherent process. We also recognize the need for a flexible trade-off between the two senses, and …


Multimedia Recommendation: Technology And Techniques, Jialie Shen, Meng Wang, Shuicheng Yan, Peng Cui Jan 2013

Multimedia Recommendation: Technology And Techniques, Jialie Shen, Meng Wang, Shuicheng Yan, Peng Cui

Research Collection School Of Computing and Information Systems

In recent years, we have witnessed a rapid growth in the availability of digital multimedia on various application platforms and domains. Consequently, the problem of information overload has become more and more serious. In order to tackle the challenge, various multimedia recommendation technologies have been developed by different research communities (e.g., multimedia systems, information retrieval, machine learning and computer version). Meanwhile, many commercial web systems (e.g., Flick, YouTube, and Last.fm) have successfully applied recommendation techniques to provide users personalized content and services in a convenient and flexible way. When looking back, the information retrieval (IR) community has a long history …


Computing Inconsistency Measure Based On Paraconsistent Semantics, Pascal Hitzler, Yue Ma, Guilin Qi Dec 2011

Computing Inconsistency Measure Based On Paraconsistent Semantics, Pascal Hitzler, Yue Ma, Guilin Qi

Computer Science and Engineering Faculty Publications

Measuring inconsistency in knowledge bases has been recognized as an important problem in several research areas. Many methods have been proposed to solve this problem and a main class of them is based on some kind of paraconsistent semantics. However, existing methods suffer from two limitations: (i) they are mostly restricted to propositional knowledge bases; (ii) very few of them discuss computational aspects of computing inconsistency measures. In this article, we try to solve these two limitations by exploring algorithms for computing an inconsistency measure of first-order knowledge bases. After introducing a four-valued semantics for first-order logic, we define an …


Searching Patterns For Relation Extraction Over The Web: Rediscovering The Pattern-Relation Duality, Yuan Fang, Kevin Chen-Chuan Chang Feb 2011

Searching Patterns For Relation Extraction Over The Web: Rediscovering The Pattern-Relation Duality, Yuan Fang, Kevin Chen-Chuan Chang

Research Collection School Of Computing and Information Systems

While tuple extraction for a given relation has been an active research area, its dual problem of pattern search- to find and rank patterns in a principled way- has not been studied explicitly. In this paper, we propose and address the problem of pattern search, in addition to tuple extraction. As our objectives, we stress reusability for pattern search and scalability of tuple extraction, such that our approach can be applied to very large corpora like the Web. As the key foundation, we propose a conceptual model PRDualRank to capture the notion of precision and recall for both tuples and …


In-Degree Dynamics Of Large-Scale P2p Systems, Zhongmei Yao, Daren B. H. Cline, Dmitri Loguinov Jan 2011

In-Degree Dynamics Of Large-Scale P2p Systems, Zhongmei Yao, Daren B. H. Cline, Dmitri Loguinov

Computer Science Faculty Publications

This paper builds a complete modeling framework for understanding user churn and in-degree dynamics in unstructured P2P systems in which each user can be viewed as a stationary alternating renewal process. While the classical Poisson result on the superposition of n stationary renewal processes for n→∞ requires that each point process become sparser as n increases, it is often difficult to rigorously show this condition in practice. In this paper, we first prove that despite user heterogeneity and non-Poisson arrival dynamics, a superposition of edge-arrival processes to a live user under uniform selection converges to a Poisson process when …


A Fair Assignment Algorithm For Multiple Preference Queries, Leong Hou U, Nikos Mamoulis, Kyriakos Mouratidis Dec 2010

A Fair Assignment Algorithm For Multiple Preference Queries, Leong Hou U, Nikos Mamoulis, Kyriakos Mouratidis

Kyriakos MOURATIDIS

Consider an internship assignment system, where at the end of each academic year, interested university students search and apply for available positions, based on their preferences (e.g., nature of the job, salary, office location, etc). In a variety of facility, task or position assignment contexts, users have personal preferences expressed by different weights on the attributes of the searched objects. Although individual preference queries can be evaluated by selecting the object in the database with the highest aggregate score, in the case of multiple simultaneous requests, a single object cannot be assigned to more than one users. The challenge is …


Capacity Constrained Assignment In Spatial Databases, Hou U Leong, Man Lung Yiu, Kyriakos Mouratidis, Nikos Mamoulis Dec 2010

Capacity Constrained Assignment In Spatial Databases, Hou U Leong, Man Lung Yiu, Kyriakos Mouratidis, Nikos Mamoulis

Kyriakos MOURATIDIS

Given a point set P of customers (e.g., WiFi receivers) and a point set Q of service providers (e.g., wireless access points), where each q 2 Q has a capacity q.k, the capacity constrained assignment (CCA) is a matching M Q × P such that (i) each point q 2 Q (p 2 P) appears at most k times (at most nce) in M, (ii) the size of M is maximized (i.e., it comprises min{|P|,P q2Q q.k} pairs), and (iii) the total assignment cost (i.e., the sum of Euclidean distances within all pairs) is minimized. Thus, the CCA problem is …


Detecting Product Review Spammers Using Rating Behaviors, Ee Peng Lim, Viet-An Nguyen, Nitin Jindal, Bing Liu, Hady Wirawan Lauw Oct 2010

Detecting Product Review Spammers Using Rating Behaviors, Ee Peng Lim, Viet-An Nguyen, Nitin Jindal, Bing Liu, Hady Wirawan Lauw

Research Collection School Of Computing and Information Systems

This paper aims to detect users generating spam reviews or review spammers. We identify several characteristic be- haviors of review spammers and model these behaviors so as to detect the spammers. In particular, we seek to model the following behaviors. First, spammers may target specific products or product groups in order to maximize their im- pact. Second, they tend to deviate from the other reviewers in their ratings of products. We propose scoring methods to measure the degree of spam for each reviewer and apply them on an Amazon review dataset. We then select a sub- set of highly suspicious …


A Fair Assignment Algorithm For Multiple Preference Queries, Leong Hou U, Nikos Mamoulis, Kyriakos Mouratidis Aug 2009

A Fair Assignment Algorithm For Multiple Preference Queries, Leong Hou U, Nikos Mamoulis, Kyriakos Mouratidis

Research Collection School Of Computing and Information Systems

Consider an internship assignment system, where at the end of each academic year, interested university students search and apply for available positions, based on their preferences (e.g., nature of the job, salary, office location, etc). In a variety of facility, task or position assignment contexts, users have personal preferences expressed by different weights on the attributes of the searched objects. Although individual preference queries can be evaluated by selecting the object in the database with the highest aggregate score, in the case of multiple simultaneous requests, a single object cannot be assigned to more than one users. The challenge is …