Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

Selected Works

2012

Institution
Keyword
Publication
File Type

Articles 1 - 30 of 364

Full-Text Articles in Entire DC Network

Computing Immutable Regions For Subspace Top-K Queries, Kyriakos Mouratidis, Hwee Hwa Pang Dec 2012

Computing Immutable Regions For Subspace Top-K Queries, Kyriakos Mouratidis, Hwee Hwa Pang

Kyriakos MOURATIDIS

Given a high-dimensional dataset, a top-k query can be used to shortlist the k tuples that best match the user’s preferences. Typically, these preferences regard a subset of the available dimensions (i.e., attributes) whose relative significance is expressed by user-specified weights. Along with the query result, we propose to compute for each involved dimension the maximal deviation to the corresponding weight for which the query result remains valid. The derived weight ranges, called immutable regions, are useful for performing sensitivity analysis, for finetuning the query weights, etc. In this paper, we focus on top-k queries with linear preference functions over …


Semantically Related Software Terms And Their Taxonomy By Leveraging Collaborative Tagging, Shaowei Wang, David Lo, Lingxiao Jiang Dec 2012

Semantically Related Software Terms And Their Taxonomy By Leveraging Collaborative Tagging, Shaowei Wang, David Lo, Lingxiao Jiang

David LO

Millions of people, including those in the software engineering communities have turned to microblogging services, such as Twitter, as a means to quickly disseminate information. A number of past studies by Treude et al., Storey, and Yuan et al. have shown that a wealth of interesting information is stored in these microblogs. However, microblogs also contain a large amount of noisy content that are less relevant to software developers in engineering software systems. In this work, we perform a preliminary study to investigate the feasibility of automatic classification of microblogs into two categories: relevant and irrelevant to engineering software systems. …


When Would This Bug Get Reported?, Ferdian Thung, David Lo, Lingxiao Jiang, Lucia Lucia, Foyzur Rahman, Premkumar Devanbu Dec 2012

When Would This Bug Get Reported?, Ferdian Thung, David Lo, Lingxiao Jiang, Lucia Lucia, Foyzur Rahman, Premkumar Devanbu

David LO

Millions of people, including those in the software engineering communities have turned to microblogging services, such as Twitter, as a means to quickly disseminate information. A number of past studies by Treude et al., Storey, and Yuan et al. have shown that a wealth of interesting information is stored in these microblogs. However, microblogs also contain a large amount of noisy content that are less relevant to software developers in engineering software systems. In this work, we perform a preliminary study to investigate the feasibility of automatic classification of microblogs into two categories: relevant and irrelevant to engineering software systems. …


Mining Indirect Antagonistic Communities From Social Interactions, Kuan Zhang, David Lo, Ee Peng Lim, Philips Kokoh Prasetyo Dec 2012

Mining Indirect Antagonistic Communities From Social Interactions, Kuan Zhang, David Lo, Ee Peng Lim, Philips Kokoh Prasetyo

David LO

Antagonistic communities refer to groups of people with opposite tastes, opinions, and factions within a community. Given a set of interactions among people in a community, we develop a novel pattern mining approach to mine a set of antagonistic communities. In particular, based on a set of user-specified thresholds, we extract a set of pairs of communities that behave in opposite ways with one another. We focus on extracting a compact lossless representation based on the concept of closed patterns to prevent exploding the number of mined antagonistic communities. We also present a variation of the algorithm using a divide …


Searching Connected Api Subgraph Via Text Phrases, Wing-Kwan Chan, Hong Cheng, David Lo Dec 2012

Searching Connected Api Subgraph Via Text Phrases, Wing-Kwan Chan, Hong Cheng, David Lo

David LO

Reusing APIs of existing libraries is a common practice during software development, but searching suitable APIs and their usages can be time-consuming [6]. In this paper, we study a new and more practical approach to help users find usages of APIs given only simple text phrases, when users have limited knowledge about an API library. We model API invocations as an API graph and aim to find an optimum connected subgraph that meets users' search needs. The problem is challenging since the search space in an API graph is very huge. We start with a greedy subgraph search algorithm which …


Learning Extended Fsa From Software: An Empirical Assessment, David Lo, Leonardo Mariani, Mauro Santoro Dec 2012

Learning Extended Fsa From Software: An Empirical Assessment, David Lo, Leonardo Mariani, Mauro Santoro

David LO

A number of techniques that infer finite state automata from execution traces have been used to support test and analysis activities. Some of these techniques can produce automata that integrate information about the data-flow, that is, they also represent how data values affect the operations executed by programs. The integration of information about operation sequences and data values into a unique model is indeed conceptually useful to accurately represent the behavior of a program. However, it is still unclear whether handling heterogeneous types of information, such as operation sequences and data values, necessarily produces higher quality models or not. In …


Scenario-Based And Value-Based Specification Mining: Better Together, David Lo, Shahar Maoz Dec 2012

Scenario-Based And Value-Based Specification Mining: Better Together, David Lo, Shahar Maoz

David LO

Specification mining takes execution traces as input and extracts likely program invariants, which can be used for comprehension, verification, and evolution related tasks. In this work we integrate scenario-based specification mining, which uses a data-mining algorithm to suggest ordering constraints in the form of live sequence charts, an inter-object, visual, modal, scenario-based specification language, with mining of value-based invariants, which detects likely invariants holding at specific program points. The key to the integration is a technique we call scenario-based slicing, running on top of the mining algorithms to distinguish the scenario-specific invariants from the general ones. The resulting suggested specifications …


Observatory Of Trends In Software Related Microblogs, Achananuparp Palakorn, Nelman Lubis Ibrahim, Yuan Tian, David Lo, Ee Peng Lim Dec 2012

Observatory Of Trends In Software Related Microblogs, Achananuparp Palakorn, Nelman Lubis Ibrahim, Yuan Tian, David Lo, Ee Peng Lim

David LO

Microblogging has recently become a popular means to disseminate information among millions of people. Interestingly, software developers also use microblog to communicate with one another. Different from traditional media, microblog users tend to focus on recency and informality of content. Many tweet contents are relatively more personal and Opinionated, compared to that of traditional news report. Thus, by analyzing microblogs, one could get the up-to-date information about what people are interested in or feel toward a particular topic. In this paper, we describe our microblog observatory that aggregates more than 70,000 Twitter feeds, captures software-related tweets, and computes trends from …


Kbe-Anonymity: Test Data Anonymization For Evolving Programs, - Lucia, David Lo, Lingxiao Jiang, Aditya Budi Dec 2012

Kbe-Anonymity: Test Data Anonymization For Evolving Programs, - Lucia, David Lo, Lingxiao Jiang, Aditya Budi

David LO

High-quality test data that is useful for effective testing is often available on users’ site. However, sharing data owned by users with software vendors may raise privacy concerns. Techniques are needed to enable data sharing among data owners and the vendors without leaking data privacy. Evolving programs bring additional challenges because data may be shared multiple times for every version of a program. When multiple versions of the data are cross-referenced, private information could be inferred. Although there are studies addressing the privacy issue of data sharing for testing and debugging, little work has explicitly addressed the challenges when programs …


To What Extent Could We Detect Field Defects? —An Empirical Study Of False Negatives In Static Bug Finding Tools, Ferdian Thung, - Lucia, David Lo, Lingxiao Jiang, Premkumar Devanbu, Foyzur Rahman Dec 2012

To What Extent Could We Detect Field Defects? —An Empirical Study Of False Negatives In Static Bug Finding Tools, Ferdian Thung, - Lucia, David Lo, Lingxiao Jiang, Premkumar Devanbu, Foyzur Rahman

David LO

Software defects can cause much loss. Static bug-finding tools are believed to help detect and remove defects. These tools are designed to find programming errors; but, do they in fact help prevent actual defects that occur in the field and reported by users? If these tools had been used, would they have detected these field defects, and generated warnings that would direct programmers to fix them? To answer these questions, we perform an empirical study that investigates the effectiveness of state-of-the-art static bug finding tools on hundreds of reported and fixed defects extracted from three open source programs: Lucene, Rhino, …


Information Retrieval Based Nearest Neighbor Classification For Fine-Grained Bug Severity Prediction, Yuan Tian, David Lo, Chengnian Sun Dec 2012

Information Retrieval Based Nearest Neighbor Classification For Fine-Grained Bug Severity Prediction, Yuan Tian, David Lo, Chengnian Sun

David LO

Bugs are prevalent in software systems. Some bugs are critical and need to be fixed right away, whereas others are minor and their fixes could be postponed until resources are available. In this work, we propose a new approach leveraging information retrieval, in particular BM25-based document similarity function, to automatically predict the severity of bug reports. Our approach automatically analyzes bug reports reported in the past along with their assigned severity labels, and recommends severity labels to newly reported bug reports. Duplicate bug reports are utilized to determine what bug report features, be it textual, ordinal, or categorical, are important. …


An Empirical Study Of Bugs In Machine Learning Systems, Ferdian Thung, Shaowei Wang, David Lo, Lingxiao Jiang Dec 2012

An Empirical Study Of Bugs In Machine Learning Systems, Ferdian Thung, Shaowei Wang, David Lo, Lingxiao Jiang

David LO

Many machine learning systems that include various data mining, information retrieval, and natural language processing code and libraries have being used in real world applications. Search engines, internet advertising systems, product recommendation systems are sample users of these algorithm intensive code and libraries. Machine learning code and toolkits have also been used in many recent studies on software mining and analytics that aim to automate various software engineering tasks. With the increasing number of important applications of machine learning systems, the reliability of such systems is also becoming increasingly important. A necessary step for ensuring reliability of such systems is …


Duplicate Bug Report Detection With A Combination Of Information Retrieval And Topic Modeling, Anh Tuan Nguyen, Tung Nguyen, Tien Nguyen, David Lo, Chengnian Sun Dec 2012

Duplicate Bug Report Detection With A Combination Of Information Retrieval And Topic Modeling, Anh Tuan Nguyen, Tung Nguyen, Tien Nguyen, David Lo, Chengnian Sun

David LO

Detecting duplicate bug reports helps reduce triaging efforts and save time for developers in fixing the same issues. Among several automated detection approaches, text-based information retrieval (IR) approaches have been shown to outperform others in term of both accuracy and time efficiency. However, those IR-based approaches do not detect well the duplicate reports on the same technical issues written in different descriptive terms. This paper introduces DBTM, a duplicate bug report detection approach that takes advantage of both IR-based features and topic-based features. DBTM models a bug report as a textual document describing certain technical issue(s), and models duplicate bug …


Automatic Classification Of Software Related Microblogs, Philips Kokoh Prasetyo, David Lo, Achananuparp Palakorn, Yuan Tian, Ee Peng Lim Dec 2012

Automatic Classification Of Software Related Microblogs, Philips Kokoh Prasetyo, David Lo, Achananuparp Palakorn, Yuan Tian, Ee Peng Lim

David LO

Millions of people, including those in the software engineering communities have turned to microblogging services, such as Twitter, as a means to quickly disseminate information. A number of past studies by Treude et al., Storey, and Yuan et al. have shown that a wealth of interesting information is stored in these microblogs. However, microblogs also contain a large amount of noisy content that are less relevant to software developers in engineering software systems. In this work, we perform a preliminary study to investigate the feasibility of automatic classification of microblogs into two categories: relevant and irrelevant to engineering software systems. …


Interactive Fault Localization Leveraging Simple User Feedbacks, Liang Gong, David Lo, Lingxiao Jiang, Hongyu Zhang Dec 2012

Interactive Fault Localization Leveraging Simple User Feedbacks, Liang Gong, David Lo, Lingxiao Jiang, Hongyu Zhang

David LO

Millions of people, including those in the software engineering communities have turned to microblogging services, such as Twitter, as a means to quickly disseminate information. A number of past studies by Treude et al., Storey, and Yuan et al. have shown that a wealth of interesting information is stored in these microblogs. However, microblogs also contain a large amount of noisy content that are less relevant to software developers in engineering software systems. In this work, we perform a preliminary study to investigate the feasibility of automatic classification of microblogs into two categories: relevant and irrelevant to engineering software systems. …


Diversity Maximization Speedup For Fault Localization, Liang Gong, David Lo, Lingxiao Jiang, Hongyu Zhang Dec 2012

Diversity Maximization Speedup For Fault Localization, Liang Gong, David Lo, Lingxiao Jiang, Hongyu Zhang

David LO

Fault localization is useful for reducing debugging effort. However, many fault localization techniques require non-trivial number of test cases with oracles, which can determine whether a program behaves correctly for every test input. Test oracle creation is expensive because it can take much manual labeling effort. Given a number of test cases to be executed, it is challenging to minimize the number of test cases requiring manual labeling and in the meantime achieve good fault localization accuracy. To address this challenge, this paper presents a novel test case selection strategy based on Diversity Maximization Speedup (DMS). DMS orders a set …


Piscine Myocarditis Virus (Pmcv) In Wild Atlantic Salmon Salmo Salar, Torstein Tengs Dr. Dec 2012

Piscine Myocarditis Virus (Pmcv) In Wild Atlantic Salmon Salmo Salar, Torstein Tengs Dr.

Dr. Torstein Tengs

Cardiomyopathy syndrome (CMS) is a severe cardiac disease of sea-farmed Atlantic salmon Salmo salar L., but CMS-like lesions have also been found in wild Atlantic salmon. In 2010 a double-stranded RNA virus of the Totiviridae family, provisionally named piscine myocarditis virus (PMCV), was described as the causative agent of CMS. In the present paper we report the first detection of PMCV in wild Atlantic salmon. The study is based on screening of 797 wild Atlantic salmon by real-time RT-PCR. The samples were collected from 35 different rivers along the coast of Norway, and all individuals included in the study were …


Ict For Poverty Alleviation In Pacific Island Nations: Study Of Icts4d In Fiji, Deogratias Harorimana, Opeti Rokotuinivono, Emali Sewale, Fane Salaiwai, Marica Naulu, Evangelin Roy Dec 2012

Ict For Poverty Alleviation In Pacific Island Nations: Study Of Icts4d In Fiji, Deogratias Harorimana, Opeti Rokotuinivono, Emali Sewale, Fane Salaiwai, Marica Naulu, Evangelin Roy

Dr Deogratias Harorimana

ICT for Poverty Alleviation in Pacific Island Nations: Study of ICTs4D in Fiji There has been a vague and little knowledge on the role or potential of Information and Communications Technologies (ICTs) in relation to addressing poverty in Fiji. This may be probably due to the newness of the technology in the South Pacific Region as a whole but also probably due to the fact that only 9.7% of the current Fiji 931,000 populations are internet users (ITC Figures 2011). This paper reports on finding how ICTs is contributing towards poverty alleviation in Fiji. On the basis of reviewed best …


Tteleconsultation Technology And Its Benefits: In The Case Of Public Hospitals In Malaysia, Nurazean Maarop, Sukdershan Hazara Singh, Khin Than Win Dec 2012

Tteleconsultation Technology And Its Benefits: In The Case Of Public Hospitals In Malaysia, Nurazean Maarop, Sukdershan Hazara Singh, Khin Than Win

Dr Khin Win

The ultimate objectives of this study are to describe teleconsultation activities and explore the benefits of teleconsultation technology in the context of public health care environment in Malaysia. The materials of this study were based on an exploratory mixed method studies involving semi-structured interviews with key informants and questionnaires survey of health care providers to obtain information about existing teleconsultation activities as well as to uncover the benefits of teleconsultation technology. The notable benefits from teleconsultation implementation were further discussed. The findings confirmed that teleconsultation service has improved health care delivery in the underserved areas to consult with specialist. One …


Optimal Error Of Query Sets Under The Differentially-Private Matrix Mechanism, Chao Li, Gerome Miklau Dec 2012

Optimal Error Of Query Sets Under The Differentially-Private Matrix Mechanism, Chao Li, Gerome Miklau

Gerome Miklau

A common goal of privacy research is to release synthetic data that satisfies a formal privacy guarantee and can be used by an analyst in place of the original data. To achieve reasonable accuracy, a synthetic data set must be tuned to support a specified set of queries accurately, sacrificing fidelity for other queries.

This work considers methods for producing synthetic data under differential privacy and investigates what makes a set of queries "easy" or "hard" to answer. We consider answering sets of linear counting queries using the matrix mechanism, a recent differentially-private mechanism that can reduce error by adding …


Simulations In 3d Tactics, Interdiction And Multi-Agent Modelling, A. R. Green, I. C. Piper, Daniel Keep, C. J. Flaherty Dec 2012

Simulations In 3d Tactics, Interdiction And Multi-Agent Modelling, A. R. Green, I. C. Piper, Daniel Keep, C. J. Flaherty

Dr Ian Piper

The analysis of vulnerabilities in large complex spaces is fundamentally problematic. The lack of capacity to generate a threat assessment merely exacerbates this problem. Lacking as well, in current literature is a developed methodology. To overcome this problem, we propose an approach using multi-agent modelling, which is also melded with three dimensional (3D) tactical understandings. Our approach builds on a microsimulation decision support tool, which was developed for a behavioural simulation of CBRN events. Microsimulation is based on the individual; who as an individual has a number of attributes, and which are stochastic (when repeated within an attribute). This approach …


Simple, Robust And Accurate Head-Pose Tracking Using A Single Camera, S. Meers, Koren Ward, I. Piper Dec 2012

Simple, Robust And Accurate Head-Pose Tracking Using A Single Camera, S. Meers, Koren Ward, I. Piper

Dr Ian Piper

This paper describes an inexpensive, robust method for tracking the head position and orientation of the user by using a single low-cost USB camera and infrared light emitting diodes concealed within spectacle frames worn by the user. Unlike gaze and head-pose tracking systems which rely on high-resolution stereo cameras and complex image processing hardware and software to find and track facial features on the user, the proposed system is able to efficiently locate and track the head's orientation and distance relative to the camera with little processing. Due to the infrared light emitting diodes having fixed geometry, the system does …


A Conceptlink Graph For Text Structure Mining, Rowena Chau, Ah Chung Tsoi, Markus Hagenbuchner, Vincent Lee Dec 2012

A Conceptlink Graph For Text Structure Mining, Rowena Chau, Ah Chung Tsoi, Markus Hagenbuchner, Vincent Lee

Dr Markus Hagenbuchner

Most text mining methods are based on representing documents using a vector space model, commonly known as a bag of word model, where each document is modeled as a linear vector representing the occurrence of independent words in the text corpus. It is well known that using this vector-based representation, important information, such as semantic relationship among concepts, is lost. This paper proposes a novel text representation model called ConceptLink graph. The ConceptLink graph does not only represent the content of the document, but also captures some of its underlying semantic structure in terms of the relationships among concepts. The …


Computational Capabilities Of Graph Neural Networks, Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, Gabriele Monfardini Dec 2012

Computational Capabilities Of Graph Neural Networks, Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, Gabriele Monfardini

Dr Markus Hagenbuchner

In this paper, we will consider the universal approximation properties of a recently introduced neural network model called graph neural network (GNN) which can be used to process structured data inputs, e.g. acyclic graph, cyclic graph, directed or un-directed graphs. This class of neural networks implements a function (G, n) 2 IRm that maps a graph Gand one of its nodes n onto an m-dimensional Euclidean space. We characterize the functions that can be approximated by GNNs, in probability, up to any prescribed degree of precision. This set contains the maps that satisfy a property, called preservation of the unfolding …


The Graph Neural Network Model, Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, Gabriele Monfardini Dec 2012

The Graph Neural Network Model, Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, Gabriele Monfardini

Dr Markus Hagenbuchner

Many underlying relationships among data in several areas of science and engineering, e.g. computer vision, molecular chemistry, molecular biology, pattern recognition, data mining, can be represented in terms of graphs. In this paper, we propose a new neural network model, called graph neural network (GNN) model, that extends existing neural network methods for processing the data represented in the graph domain. This GNN model, which can directly process most of the practically useful types of graphs, e.g. acyclic, cyclic, directed, un-directed, implements a transduction function $\tau(\BG,n)\in\R^m$ that maps a graph $\BG$ and one of its nodes $n$ into an m-dimensional …


Ranking Attack Graphs With Graph Neural Networks, Liang Lu, Rei Safavi-Naini, Markus Hagenbuchner, Willy Susilo, Jeffrey Horton, Sweah Liang Yong, Ah Chung Tsoi Dec 2012

Ranking Attack Graphs With Graph Neural Networks, Liang Lu, Rei Safavi-Naini, Markus Hagenbuchner, Willy Susilo, Jeffrey Horton, Sweah Liang Yong, Ah Chung Tsoi

Dr Markus Hagenbuchner

Network security analysis based on attack graphs has been applied extensively in recent years. The ranking of nodes in an attack graph is an important step towards analyzing network security. This paper proposes an alternative attack graph ranking scheme based on a recent approach to machine learning in a structured graph domain, namely, Graph Neural Networks (GNNs). Evidence is presented in this paper that the GNN is suitable for the task of ranking attack graphs by learning a ranking function from examples and generalizes the function to unseen possibly noisy data, thus showing that the GNN provides an effective alternative …


Data Curation Is For Everyone! The Case For Master's And Baccalaureate Institutional Engagement With Data Curation, Yasmeen Shorish Dec 2012

Data Curation Is For Everyone! The Case For Master's And Baccalaureate Institutional Engagement With Data Curation, Yasmeen Shorish

Yasmeen Shorish

This article describes the fundamental challenges to data curation, how these challenges may be compounded for smaller institutions, and how data management is an essential and manageable component of data curation. Data curation is often discussed within the confines of large, research universities. As a result, master’s and baccalaureate institutions may be left with the impression that they cannot engage with data curation. However, by proactively engaging with faculty, libraries of all sizes can build closer relationships and help educate faculty on data documentation and organization best practices. Experiences from one master’s comprehensive institution as it engages with data management …


A Model For Coherent Distributed Systems, Robert L. Brown, Peter J. Denning, Walter F. Tichy Dec 2012

A Model For Coherent Distributed Systems, Robert L. Brown, Peter J. Denning, Walter F. Tichy

Dr Robert Brown

No abstract provided.


Automatic, Remote Status Lights For Vax Unix, Douglas E. Comer, Robert L. Brown Dec 2012

Automatic, Remote Status Lights For Vax Unix, Douglas E. Comer, Robert L. Brown

Dr Robert Brown

No abstract provided.


Should Distributed Systems Be Hidden?, Peter J. Denning, Robert L. Brown Dec 2012

Should Distributed Systems Be Hidden?, Peter J. Denning, Robert L. Brown

Dr Robert Brown

No abstract provided.