Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 20 of 20

Full-Text Articles in Physical Sciences and Mathematics

Will Fault Localization Work For These Failures? An Automated Approach To Predict Effectiveness Of Fault Localization Tools, Tien-Duy B. Le, David Lo Jun 2014

Will Fault Localization Work For These Failures? An Automated Approach To Predict Effectiveness Of Fault Localization Tools, Tien-Duy B. Le, David Lo

David LO

Debugging is a crucial yet expensive activity to improve the reliability of software systems. To reduce debugging cost, various fault localization tools have been proposed. A spectrum-based fault localization tool often outputs an ordered list of program elements sorted based on their likelihood to be the root cause of a set of failures (i.e., their suspiciousness scores). Despite the many studies on fault localization, unfortunately, however, for many bugs, the root causes are often low in the ordered list. This potentially causes developers to distrust fault localization tools. Recently, Parnin and Orso highlight in their user study that many debuggers …


Drone: Predicting Priority Of Reported Bugs By Multi-Factor Analysis, Yuan Tian, David Lo, Chengnian Sun Jun 2014

Drone: Predicting Priority Of Reported Bugs By Multi-Factor Analysis, Yuan Tian, David Lo, Chengnian Sun

David LO

Bugs are prevalent. To improve software quality, developers often allow users to report bugs that they found using a bug tracking system such as Bugzilla. Users would specify among other things, a description of the bug, the component that is affected by the bug, and the severity of the bug. Based on this information, bug triagers would then assign a priority level to the reported bug. As resources are limited, bug reports would be investigated based on their priority levels. This priority assignment process however is a manual one. Could we do better? In this paper, we propose an automated …


Multi-Abstraction Concern Localization, Tien-Duy B. Duy, Shaowei Wang, David Lo Jun 2014

Multi-Abstraction Concern Localization, Tien-Duy B. Duy, Shaowei Wang, David Lo

David LO

Concern localization refers to the process of locating code units that match a particular textual description. It takes as input textual documents such as bug reports and feature requests and outputs a list of candidate code units that need to be changed to address the bug reports or feature requests. Many information retrieval (IR) based concern localization techniques have been proposed in the literature. These techniques typically represent code units and textual descriptions as a bag of tokens at one level of abstraction, e.g., each token is a word, or each token is a topic. In this work, we propose …


Theory And Practice, Do They Match? A Case With Spectrum-Based Fault Localization, Tien-Duy B. Le, Ferdian Thung, David Lo Jun 2014

Theory And Practice, Do They Match? A Case With Spectrum-Based Fault Localization, Tien-Duy B. Le, Ferdian Thung, David Lo

David LO

Spectrum-based fault localization refers to the process of identifying program units that are buggy from two sets of execution traces: normal traces and faulty traces. These approaches use statistical formulas to measure the suspiciousness of program units based on the execution traces. There have been many spectrum-based fault localization approaches proposing various formulas in the literature. Two of the best performing and well-known ones are Tarantula and Ochiai. Recently, Xie et al. find that theoretically, under certain assumptions, two families of spectrum-based fault localization formulas outperform all other formulas including those of Tarantula and Ochiai. In this work, we empirically …


Empirical Evaluation Of Bug Linking, Tegawendé F. Bissyande, Ferdian Thung, Shaowei Wang, David Lo, Lingxiao Jiang, Laurent Réveillère Apr 2013

Empirical Evaluation Of Bug Linking, Tegawendé F. Bissyande, Ferdian Thung, Shaowei Wang, David Lo, Lingxiao Jiang, Laurent Réveillère

David LO

To collect software bugs found by users, development teams often setup bug trackers using systems such as Bugzilla. Developers would then fix some of the bugs and commit corresponding code changes into version control systems such as svn or git. Unfortunately, the links between bug reports and code changes are missing for many software projects as the bug tracking and version control systems are often maintained separately. Yet, linking bug reports to fix commits is important as it could shed light into the nature of bug fixing processes and expose patterns in software management. Bug linking solutions, such as ReLink, …


Automatic Defect Categorization, Ferdian Thung, David Lo, Lingxiao Jiang Apr 2013

Automatic Defect Categorization, Ferdian Thung, David Lo, Lingxiao Jiang

David LO

Defects are prevalent in software systems. In order to understand defects better, industry practitioners often categorize bugs into various types. One common kind of categorization is the IBM’s Orthogonal Defect Classification (ODC). ODC proposes various orthogonal classification of defects based on much information about the defects, such as the symptoms and semantics of the defects, the root cause analysis of the defects, and many more. With these category labels, developers can better perform post-mortem analysis to find out what the common characteristics of the defects that plague a particular software project are. Albeit the benefits of having these categories, for …


Starcraft Ii In-Game Action Lists, Wei Gong, Ee Peng Lim, Palakorn Achananuparp, Feida Zhu, David Lo, Freddy Chua Aug 2012

Starcraft Ii In-Game Action Lists, Wei Gong, Ee Peng Lim, Palakorn Achananuparp, Feida Zhu, David Lo, Freddy Chua

David LO

1732 event logs of actions performed by players in Starcraft II public replays downloaded from GameReplays.org.

The Data Set consists of 1732 log files (Size: 55 MB) compressed into a Zip archive (Size: 9.3 MB).


Non-Redundant Sequential Rules,Theory And Algorithm, David Lo, Siau-Cheng Khoo, Limsoon Wong Nov 2011

Non-Redundant Sequential Rules,Theory And Algorithm, David Lo, Siau-Cheng Khoo, Limsoon Wong

David LO

A sequential rule expresses a relationship between two series of events happening one after another. Sequential rules are potentially useful for analyzing data in sequential format, ranging from purchase histories, network logs and program execution traces. In this work, we investigate and propose a syntactic characterization of a non-redundant set of sequential rules built upon past work on compact set of representative patterns. A rule is redundant if it can be inferred from another rule having the same support and confidence. When using the set of mined rules as a composite filter, replacing a full set of rules with a …


Matching Dependence-Related Queries In The System Dependence Graph., Xiaoyin Wang, David Lo, Jiefeng Cheng, Lu Zhang, Hong Mei, Jeffrey Xu Yu Nov 2011

Matching Dependence-Related Queries In The System Dependence Graph., Xiaoyin Wang, David Lo, Jiefeng Cheng, Lu Zhang, Hong Mei, Jeffrey Xu Yu

David LO

In software maintenance and evolution, it is common that developers want to apply a change to a number of similar places. Due to the size and complexity of the code base, it is challenging for developers to locate all the places that need the change. A main challenge in locating the places that need the change is that, these places share certain common dependence conditions but existing code searching techniques can hardly handle dependence relations satisfactorily. In this paper, we propose a technique that enables developers to make queries involving dependence conditions and textual conditions on the system dependence graph …


Mining Iterative Generators And Representative Rules For Software Specification Discovery, David Lo, Jinyan Li, Limsoon Wong, Siau-Cheng Khoo Nov 2011

Mining Iterative Generators And Representative Rules For Software Specification Discovery, David Lo, Jinyan Li, Limsoon Wong, Siau-Cheng Khoo

David LO

Billions of dollars are spent annually on software-related cost. It is estimated that up to 45 percent of software cost is due to the difficulty in understanding existing systems when performing maintenance tasks (i.e., adding features, removing bugs, etc.). One of the root causes is that software products often come with poor, incomplete, or even without any documented specifications. In an effort to improve program understanding, Lo et al. have proposed iterative pattern mining which outputs patterns that are repeated frequently within a program trace, or across multiple traces, or both. Frequent iterative patterns reflect frequent program behaviors that likely …


Extracting Paraphrases Of Technical Terms From Noisy Parallel Software Corpus, Xiaoyin Wang, David Lo, Jing Jiang, Lu Zhang, Hong Mei Nov 2011

Extracting Paraphrases Of Technical Terms From Noisy Parallel Software Corpus, Xiaoyin Wang, David Lo, Jing Jiang, Lu Zhang, Hong Mei

David LO

In this paper, we study the problem of extracting technical paraphrases from a parallel software corpus, namely, a collection of duplicate bug reports. Paraphrase acquisition is a fundamental task in the emerging area of text mining for software engineering. Existing paraphrase extraction methods are not entirely suitable here due to the noisy nature of bug reports. We propose a number of techniques to address the noisy data problem. The empirical evaluation shows that our method significantly improves an existing method by upto 58%


Mining Message Sequence Graphs, Sandeep Kumar, Siau-Cheng Khoo, Abhik Roychoudhury, David Lo Nov 2011

Mining Message Sequence Graphs, Sandeep Kumar, Siau-Cheng Khoo, Abhik Roychoudhury, David Lo

David LO

Dynamic specification mining involves discovering software behavior from traces for the purpose of program comprehension and bug detection. However, in concurrent/distributed programs, the inherent partial order relationships among events occurring across processes pose a big challenge to specification mining. In this paper, we propose a framework for mining partial orders so as to understand concurrent program behavior. Our miner takes in a set of concurrent program traces, and produces a message sequence graph (MSG) to represent the concurrent program behavior. An MSG represents a graph where the nodes of the graph are partial orders, represented as Message Sequence Charts. Mining …


Hierarchical Inter-Object Traces For Specification Mining, David Lo, Shahar Maoz Nov 2011

Hierarchical Inter-Object Traces For Specification Mining, David Lo, Shahar Maoz

David LO

Major challenges of dynamic analysis approaches to specification mining include scalability over long traces as well as comprehensibility and expressivity of results. We present a novel use of object hierarchies over inter-object traces as an abstraction/refinement mechanism enabling scalable, incremental, top-down mining of scenario-based specifications.


Specification Mining Of Symbolic Scenario-Based Models, David Lo, Shahar Maoz Nov 2011

Specification Mining Of Symbolic Scenario-Based Models, David Lo, Shahar Maoz

David LO

Many dynamic analysis approaches to specification mining, which extract behavioral models from execution traces, do not consider object identities. This limits their power when used to analyze traces of general object oriented programs. In this work we present a novel specification mining approach that considers object identities, and, moreover, generalizes from specifications involving concrete objects to their symbolic class-level abstractions. Our approach uses data mining methods to extract significant scenario-based specifications in the form of Damm and Harel's live sequence charts (LSC), a formal and expressive extension of classic sequence diagrams. We guarantee that all mined symbolic LSCs are significant …


Mining Closed Discriminative Dyadic Sequential Patterns, David Lo, Hong Cheng, - Lucia Nov 2011

Mining Closed Discriminative Dyadic Sequential Patterns, David Lo, Hong Cheng, - Lucia

David LO

A lot of data are in sequential formats. In this study, we are interested in sequential data that goes in pairs. There are many interesting datasets in this format coming from various domains including parallel textual corpora, duplicate bug reports, and other pairs of related sequences of events. Our goal is to mine a set of closed discriminative dyadic sequential patterns from a database of sequence pairs each belonging to one of the two classes +ve and -ve. These dyadic sequential patterns characterize the discriminating facets contrasting the two classes. They are potentially good features to be used for the …


Mining And Ranking Generators Of Sequential Pattern, David Lo, Siau-Cheng Khoo, Jinyan Li Nov 2011

Mining And Ranking Generators Of Sequential Pattern, David Lo, Siau-Cheng Khoo, Jinyan Li

David LO

Sequential pattern mining ¯rst proposed by Agrawal and Srikant has received intensive research due to its wide range applicability in many real-life domains. Various improvements have been proposed which include mining a closed set of sequential patterns. Sequential patterns supported by the same sequences in the database can be considered as belonging to an equivalence class. Each equivalence class contains patterns partially-ordered by sub-sequence relationship and having the same support. Within an equivalence class, the set of maximal and minimal patterns are referred to as closed patterns and generators respectively. Generators used together with closed patterns can provide additional information …


Mining Temporal Rules For Software Maintenance, David Lo, Siau-Cheng Khoo, Chao Liu Nov 2011

Mining Temporal Rules For Software Maintenance, David Lo, Siau-Cheng Khoo, Chao Liu

David LO

Software evolution incurs difficulties in program comprehension and software verification, and hence it increases the cost of software maintenance. In this study, we propose a novel technique to mine from program execution traces a sound and complete set of statistically significant temporal rules of arbitrary lengths. The extracted temporal rules reveal invariants that the program observes, and will consequently guide developers to understand the program behaviors, and facilitate all downstream applications such as verification and debugging. Different from previous studies that were restricted to mining two-event rules (e.g., (lock) →(unlock)), our algorithm discovers rules of arbitrary lengths. In order to …


An Automated Approach For Finding Variable-Constant Pairing Bugs, Julia Lawall, David Lo Nov 2011

An Automated Approach For Finding Variable-Constant Pairing Bugs, Julia Lawall, David Lo

David LO

Named constants are used heavily in operating systems code, both as internal flags and in interactions with devices. Decision making within an operating system thus critically depends on the correct usage of these values. Nevertheless, compilers for the languages typically used in implementing operating systems provide little support for checking the usage of named constants. This affects correctness, when a constant is used in a context where its value is meaningless, and software maintenance, when a constant has the right value for its usage context but the wrong name. We propose a hybrid program-analysis and data-mining based approach to identify …


Scenario-Based And Value-Based Specification Mining: Better Together, David Lo, Shahar Maoz Nov 2011

Scenario-Based And Value-Based Specification Mining: Better Together, David Lo, Shahar Maoz

David LO

Specification mining takes execution traces as input and extracts likely program invariants, which can be used for comprehension, verification, and evolution related tasks. In this work we integrate scenario-based specification mining, which uses data-mining algorithms to suggest ordering constraints in the form of live sequence charts, an inter-object, visual, modal, scenario-based specification language, with mining of value-based invariants, which detects likely invariants holding at specific program points. The key to the integration is a technique we call scenario-based slicing, running on top of the mining algorithms to distinguish the scenario-specific invariants from the general ones. The resulting suggested specifications are …


Lm: A Miner For Scenario-Based Specifications, Tuan Anh Doan, David Lo, Shahar Maoz, Siau-Cheng Khoo Nov 2011

Lm: A Miner For Scenario-Based Specifications, Tuan Anh Doan, David Lo, Shahar Maoz, Siau-Cheng Khoo

David LO

We present LM, a tool for mining scenario-based specifications in the form of Live Sequence Charts, a visual language that extends sequence diagrams with modalities. LM comes with a project management component, a wizard-like interface to the mining algorithm, a set of pre- and postprocessing extensions, and a visualization module.