Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 25 of 25

Full-Text Articles in Physical Sciences and Mathematics

Semantically Related Software Terms And Their Taxonomy By Leveraging Collaborative Tagging, Shaowei Wang, David Lo, Lingxiao Jiang Dec 2012

Semantically Related Software Terms And Their Taxonomy By Leveraging Collaborative Tagging, Shaowei Wang, David Lo, Lingxiao Jiang

David LO

Millions of people, including those in the software engineering communities have turned to microblogging services, such as Twitter, as a means to quickly disseminate information. A number of past studies by Treude et al., Storey, and Yuan et al. have shown that a wealth of interesting information is stored in these microblogs. However, microblogs also contain a large amount of noisy content that are less relevant to software developers in engineering software systems. In this work, we perform a preliminary study to investigate the feasibility of automatic classification of microblogs into two categories: relevant and irrelevant to engineering software systems. …


When Would This Bug Get Reported?, Ferdian Thung, David Lo, Lingxiao Jiang, Lucia Lucia, Foyzur Rahman, Premkumar Devanbu Dec 2012

When Would This Bug Get Reported?, Ferdian Thung, David Lo, Lingxiao Jiang, Lucia Lucia, Foyzur Rahman, Premkumar Devanbu

David LO

Millions of people, including those in the software engineering communities have turned to microblogging services, such as Twitter, as a means to quickly disseminate information. A number of past studies by Treude et al., Storey, and Yuan et al. have shown that a wealth of interesting information is stored in these microblogs. However, microblogs also contain a large amount of noisy content that are less relevant to software developers in engineering software systems. In this work, we perform a preliminary study to investigate the feasibility of automatic classification of microblogs into two categories: relevant and irrelevant to engineering software systems. …


Mining Indirect Antagonistic Communities From Social Interactions, Kuan Zhang, David Lo, Ee Peng Lim, Philips Kokoh Prasetyo Dec 2012

Mining Indirect Antagonistic Communities From Social Interactions, Kuan Zhang, David Lo, Ee Peng Lim, Philips Kokoh Prasetyo

David LO

Antagonistic communities refer to groups of people with opposite tastes, opinions, and factions within a community. Given a set of interactions among people in a community, we develop a novel pattern mining approach to mine a set of antagonistic communities. In particular, based on a set of user-specified thresholds, we extract a set of pairs of communities that behave in opposite ways with one another. We focus on extracting a compact lossless representation based on the concept of closed patterns to prevent exploding the number of mined antagonistic communities. We also present a variation of the algorithm using a divide …


Searching Connected Api Subgraph Via Text Phrases, Wing-Kwan Chan, Hong Cheng, David Lo Dec 2012

Searching Connected Api Subgraph Via Text Phrases, Wing-Kwan Chan, Hong Cheng, David Lo

David LO

Reusing APIs of existing libraries is a common practice during software development, but searching suitable APIs and their usages can be time-consuming [6]. In this paper, we study a new and more practical approach to help users find usages of APIs given only simple text phrases, when users have limited knowledge about an API library. We model API invocations as an API graph and aim to find an optimum connected subgraph that meets users' search needs. The problem is challenging since the search space in an API graph is very huge. We start with a greedy subgraph search algorithm which …


Learning Extended Fsa From Software: An Empirical Assessment, David Lo, Leonardo Mariani, Mauro Santoro Dec 2012

Learning Extended Fsa From Software: An Empirical Assessment, David Lo, Leonardo Mariani, Mauro Santoro

David LO

A number of techniques that infer finite state automata from execution traces have been used to support test and analysis activities. Some of these techniques can produce automata that integrate information about the data-flow, that is, they also represent how data values affect the operations executed by programs. The integration of information about operation sequences and data values into a unique model is indeed conceptually useful to accurately represent the behavior of a program. However, it is still unclear whether handling heterogeneous types of information, such as operation sequences and data values, necessarily produces higher quality models or not. In …


Scenario-Based And Value-Based Specification Mining: Better Together, David Lo, Shahar Maoz Dec 2012

Scenario-Based And Value-Based Specification Mining: Better Together, David Lo, Shahar Maoz

David LO

Specification mining takes execution traces as input and extracts likely program invariants, which can be used for comprehension, verification, and evolution related tasks. In this work we integrate scenario-based specification mining, which uses a data-mining algorithm to suggest ordering constraints in the form of live sequence charts, an inter-object, visual, modal, scenario-based specification language, with mining of value-based invariants, which detects likely invariants holding at specific program points. The key to the integration is a technique we call scenario-based slicing, running on top of the mining algorithms to distinguish the scenario-specific invariants from the general ones. The resulting suggested specifications …


Observatory Of Trends In Software Related Microblogs, Achananuparp Palakorn, Nelman Lubis Ibrahim, Yuan Tian, David Lo, Ee Peng Lim Dec 2012

Observatory Of Trends In Software Related Microblogs, Achananuparp Palakorn, Nelman Lubis Ibrahim, Yuan Tian, David Lo, Ee Peng Lim

David LO

Microblogging has recently become a popular means to disseminate information among millions of people. Interestingly, software developers also use microblog to communicate with one another. Different from traditional media, microblog users tend to focus on recency and informality of content. Many tweet contents are relatively more personal and Opinionated, compared to that of traditional news report. Thus, by analyzing microblogs, one could get the up-to-date information about what people are interested in or feel toward a particular topic. In this paper, we describe our microblog observatory that aggregates more than 70,000 Twitter feeds, captures software-related tweets, and computes trends from …


Kbe-Anonymity: Test Data Anonymization For Evolving Programs, - Lucia, David Lo, Lingxiao Jiang, Aditya Budi Dec 2012

Kbe-Anonymity: Test Data Anonymization For Evolving Programs, - Lucia, David Lo, Lingxiao Jiang, Aditya Budi

David LO

High-quality test data that is useful for effective testing is often available on users’ site. However, sharing data owned by users with software vendors may raise privacy concerns. Techniques are needed to enable data sharing among data owners and the vendors without leaking data privacy. Evolving programs bring additional challenges because data may be shared multiple times for every version of a program. When multiple versions of the data are cross-referenced, private information could be inferred. Although there are studies addressing the privacy issue of data sharing for testing and debugging, little work has explicitly addressed the challenges when programs …


To What Extent Could We Detect Field Defects? —An Empirical Study Of False Negatives In Static Bug Finding Tools, Ferdian Thung, - Lucia, David Lo, Lingxiao Jiang, Premkumar Devanbu, Foyzur Rahman Dec 2012

To What Extent Could We Detect Field Defects? —An Empirical Study Of False Negatives In Static Bug Finding Tools, Ferdian Thung, - Lucia, David Lo, Lingxiao Jiang, Premkumar Devanbu, Foyzur Rahman

David LO

Software defects can cause much loss. Static bug-finding tools are believed to help detect and remove defects. These tools are designed to find programming errors; but, do they in fact help prevent actual defects that occur in the field and reported by users? If these tools had been used, would they have detected these field defects, and generated warnings that would direct programmers to fix them? To answer these questions, we perform an empirical study that investigates the effectiveness of state-of-the-art static bug finding tools on hundreds of reported and fixed defects extracted from three open source programs: Lucene, Rhino, …


Information Retrieval Based Nearest Neighbor Classification For Fine-Grained Bug Severity Prediction, Yuan Tian, David Lo, Chengnian Sun Dec 2012

Information Retrieval Based Nearest Neighbor Classification For Fine-Grained Bug Severity Prediction, Yuan Tian, David Lo, Chengnian Sun

David LO

Bugs are prevalent in software systems. Some bugs are critical and need to be fixed right away, whereas others are minor and their fixes could be postponed until resources are available. In this work, we propose a new approach leveraging information retrieval, in particular BM25-based document similarity function, to automatically predict the severity of bug reports. Our approach automatically analyzes bug reports reported in the past along with their assigned severity labels, and recommends severity labels to newly reported bug reports. Duplicate bug reports are utilized to determine what bug report features, be it textual, ordinal, or categorical, are important. …


Semantic Patch Inference, Jesper Abdersen, Anh Cuong Nguyen, David Lo, Julia Lawall, Siau-Cheng Khoo Dec 2012

Semantic Patch Inference, Jesper Abdersen, Anh Cuong Nguyen, David Lo, Julia Lawall, Siau-Cheng Khoo

David LO

We propose a tool for inferring transformation specifications from a few examples of original and updated code. These transformation specifications may contain multiple code fragments from within a single function, all of which must be present for the transformation to apply. This makes the inferred transformations context sensitive. Our algorithm is based on depth-first search, with pruning. Because it is applied locally to a collection of functions that contain related changes, it is efficient in practice. We illustrate the approach on an example drawn from recent changes to the Linux kernel.


An Empirical Study Of Bugs In Machine Learning Systems, Ferdian Thung, Shaowei Wang, David Lo, Lingxiao Jiang Dec 2012

An Empirical Study Of Bugs In Machine Learning Systems, Ferdian Thung, Shaowei Wang, David Lo, Lingxiao Jiang

David LO

Many machine learning systems that include various data mining, information retrieval, and natural language processing code and libraries have being used in real world applications. Search engines, internet advertising systems, product recommendation systems are sample users of these algorithm intensive code and libraries. Machine learning code and toolkits have also been used in many recent studies on software mining and analytics that aim to automate various software engineering tasks. With the increasing number of important applications of machine learning systems, the reliability of such systems is also becoming increasingly important. A necessary step for ensuring reliability of such systems is …


Duplicate Bug Report Detection With A Combination Of Information Retrieval And Topic Modeling, Anh Tuan Nguyen, Tung Nguyen, Tien Nguyen, David Lo, Chengnian Sun Dec 2012

Duplicate Bug Report Detection With A Combination Of Information Retrieval And Topic Modeling, Anh Tuan Nguyen, Tung Nguyen, Tien Nguyen, David Lo, Chengnian Sun

David LO

Detecting duplicate bug reports helps reduce triaging efforts and save time for developers in fixing the same issues. Among several automated detection approaches, text-based information retrieval (IR) approaches have been shown to outperform others in term of both accuracy and time efficiency. However, those IR-based approaches do not detect well the duplicate reports on the same technical issues written in different descriptive terms. This paper introduces DBTM, a duplicate bug report detection approach that takes advantage of both IR-based features and topic-based features. DBTM models a bug report as a textual document describing certain technical issue(s), and models duplicate bug …


Automatic Classification Of Software Related Microblogs, Philips Kokoh Prasetyo, David Lo, Achananuparp Palakorn, Yuan Tian, Ee Peng Lim Dec 2012

Automatic Classification Of Software Related Microblogs, Philips Kokoh Prasetyo, David Lo, Achananuparp Palakorn, Yuan Tian, Ee Peng Lim

David LO

Millions of people, including those in the software engineering communities have turned to microblogging services, such as Twitter, as a means to quickly disseminate information. A number of past studies by Treude et al., Storey, and Yuan et al. have shown that a wealth of interesting information is stored in these microblogs. However, microblogs also contain a large amount of noisy content that are less relevant to software developers in engineering software systems. In this work, we perform a preliminary study to investigate the feasibility of automatic classification of microblogs into two categories: relevant and irrelevant to engineering software systems. …


Interactive Fault Localization Leveraging Simple User Feedbacks, Liang Gong, David Lo, Lingxiao Jiang, Hongyu Zhang Dec 2012

Interactive Fault Localization Leveraging Simple User Feedbacks, Liang Gong, David Lo, Lingxiao Jiang, Hongyu Zhang

David LO

Millions of people, including those in the software engineering communities have turned to microblogging services, such as Twitter, as a means to quickly disseminate information. A number of past studies by Treude et al., Storey, and Yuan et al. have shown that a wealth of interesting information is stored in these microblogs. However, microblogs also contain a large amount of noisy content that are less relevant to software developers in engineering software systems. In this work, we perform a preliminary study to investigate the feasibility of automatic classification of microblogs into two categories: relevant and irrelevant to engineering software systems. …


Diversity Maximization Speedup For Fault Localization, Liang Gong, David Lo, Lingxiao Jiang, Hongyu Zhang Dec 2012

Diversity Maximization Speedup For Fault Localization, Liang Gong, David Lo, Lingxiao Jiang, Hongyu Zhang

David LO

Fault localization is useful for reducing debugging effort. However, many fault localization techniques require non-trivial number of test cases with oracles, which can determine whether a program behaves correctly for every test input. Test oracle creation is expensive because it can take much manual labeling effort. Given a number of test cases to be executed, it is challenging to minimize the number of test cases requiring manual labeling and in the meantime achieve good fault localization accuracy. To address this challenge, this paper presents a novel test case selection strategy based on Diversity Maximization Speedup (DMS). DMS orders a set …


What Does Software Engineering Community Microblog About?, Yuan Tian, Palakorn Achananuparp, Ibrahim Nelman Lubis, David Lo, Ee Peng Lim Aug 2012

What Does Software Engineering Community Microblog About?, Yuan Tian, Palakorn Achananuparp, Ibrahim Nelman Lubis, David Lo, Ee Peng Lim

David LO

Microblogging is a new trend to communicate and to disseminate information. One microblog post could potentially reach millions of users. Millions of microblogs are generated on a daily basis on popular sites such as Twitter. The popularity of microblogging among programmers, software engineers, and software users has also led to their use of microblogs to communicate software engineering issues apart from using emails and other traditional communication channels.Understanding how millions of users use microblogs in software engineering related activities would shed light on ways we could leverage the fast evolving microblogging content to aid software development efforts. In this work, …


Improved Duplicate Bug Report Identification, Yuan Tian, Chengnian Sun, David Lo Aug 2012

Improved Duplicate Bug Report Identification, Yuan Tian, Chengnian Sun, David Lo

David LO

Bugs are prevalent in software systems. To improve the reliability of software systems, developers often allow end users to provide feedback on bugs that they encounter. Users could perform this by sending a bug report in a bug report management system like Bugzilla. This process however is uncoordinated and distributed, which means that many users could submit bug reports reporting the same problem. These are referred to as duplicate bug reports. The existence of many duplicate bug reports may cause much unnecessary manual efforts as often a triager would need to manually tag bug reports as being duplicates. Recently, there …


Identifying Linux Bug Fixing Patches, Tian Yuan, Julia Lawall, David Lo Aug 2012

Identifying Linux Bug Fixing Patches, Tian Yuan, Julia Lawall, David Lo

David LO

In the evolution of an operating system there is a continuing tension between the need to develop and test new features, and the need to provide a stable and secure execution environment to users. A compromise, adopted by the developers of the Linux kernel, is to release new versions, including bug fixes and new features, frequently, while maintaining some older “longterm” versions. This strategy raises the problem of how to identify bug fixing patches that are submitted to the current version but should be applied to the longterm versions as well. The current approach is to rely on the individual …


Human: Creating Memorable Fingerprints Of Mobile Users, Gupta Payas, Kiat Wee Tan, Narayanasamy Ramasubbu, David Lo, Debin Gao, Rajesh Krishna Balan Aug 2012

Human: Creating Memorable Fingerprints Of Mobile Users, Gupta Payas, Kiat Wee Tan, Narayanasamy Ramasubbu, David Lo, Debin Gao, Rajesh Krishna Balan

David LO

In this paper, we present a new way of generating behavioral (not biometric) fingerprints from the cellphone usage data. In particular, we explore if the generated behavioral fingerprints are memorable enough to be remembered by end users. We built a system, called HuMan, that generates fingerprints from cellphone data. To test HuMan, we conducted an extensive user study that involved collecting about one month of continuous usage data (including calls, SMSes, application usage patterns etc.) from 44 Symbian and Android smartphone users. We evaluated the memorable fingerprints generated from this rich multi-context data by asking each user to answer various …


Understanding Task-Driven Information Flow In Collaborative Networks, Gengxin Miao, Shu Tao, Winnie Cheng, Randy Moulic, Louise E. Moser, David Lo, Xifeng Yan Aug 2012

Understanding Task-Driven Information Flow In Collaborative Networks, Gengxin Miao, Shu Tao, Winnie Cheng, Randy Moulic, Louise E. Moser, David Lo, Xifeng Yan

David LO

Collaborative networks are a special type of social network formed by members who collectively achieve specific goals, such as fixing software bugs and resolving customers’ problems. In such networks, information flow among members is driven by the tasks assigned to the network, and by the expertise of its members to complete those tasks. In this work, we analyze real-life collaborative networks to understand their common characteristics and how information is routed in these networks. Our study shows that collaborative networks exhibit significantly different properties compared with other complex networks. Collaborative networks have truncated power-law node degree distributions and other organizational …


Are Faults Localizable?, - Lucia, Thung Ferdian, David Lo, Lingxiao Jiang Aug 2012

Are Faults Localizable?, - Lucia, Thung Ferdian, David Lo, Lingxiao Jiang

David LO

Many fault localization techniques have been proposed to facilitate debugging activities. Most of them attempt to pinpoint the location of faults (i.e., localize faults) based on a set of failing and correct executions and expect debuggers to investigate a certain number of located program elements to find faults. These techniques thus assume that faults are localizable, i.e., only one or a few lines of code that are close to one another are responsible for each fault. However, in reality, are faults localizable? In this work, we investigate hundreds of real faults in several software systems, and find that many faults …


Active Refinement Of Clone Anomaly Reports, Lucia, David Lo, Lingxiao Jiang, Aditya Budi Aug 2012

Active Refinement Of Clone Anomaly Reports, Lucia, David Lo, Lingxiao Jiang, Aditya Budi

David LO

Software clones have been widely studied in the recent literature and shown useful for finding bugs because inconsistent changes among clones in a clone group may indicate potential bugs. However, many inconsistent clone groups are not real bugs (true positives). The excessive number of false positives could easily impede broad adoption of clone-based bug detection approaches. In this work, we aim to improve the usability of clone-based bug detection tools by increasing the rate of true positives found when a developer analyzes anomaly reports. Our idea is to control the number of anomaly reports a user can see at a …


Inferring Class Level Specifications For Distributed Systems, Sandeep Kumar, Siau-Cheng Khoo, Abhik Roychoudhury, David Lo Aug 2012

Inferring Class Level Specifications For Distributed Systems, Sandeep Kumar, Siau-Cheng Khoo, Abhik Roychoudhury, David Lo

David LO

Distributed systems often contain many behaviorally similar processes, which are conveniently grouped into classes. In system modeling, it is common to specify such systems by describing the class level behavior, instead of object level behavior. While there have been techniques that mine specifications of such distributed systems from their execution traces, these methods only mine object-level specifications involving concrete process objects. This leads to specifications which are large, hard to comprehend, and sensitive to simple changes in the system (such as the number of objects). In this paper, we develop a class level specification mining framework for distributed systems. A …


Starcraft Ii In-Game Action Lists, Wei Gong, Ee Peng Lim, Palakorn Achananuparp, Feida Zhu, David Lo, Freddy Chua Aug 2012

Starcraft Ii In-Game Action Lists, Wei Gong, Ee Peng Lim, Palakorn Achananuparp, Feida Zhu, David Lo, Freddy Chua

David LO

1732 event logs of actions performed by players in Starcraft II public replays downloaded from GameReplays.org.

The Data Set consists of 1732 log files (Size: 55 MB) compressed into a Zip archive (Size: 9.3 MB).