Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 31 - 60 of 126

Full-Text Articles in Physical Sciences and Mathematics

Theory And Practice, Do They Match? A Case With Spectrum-Based Fault Localization, Tien-Duy B. Le, Ferdian Thung, David Lo Jun 2014

Theory And Practice, Do They Match? A Case With Spectrum-Based Fault Localization, Tien-Duy B. Le, Ferdian Thung, David Lo

David LO

Spectrum-based fault localization refers to the process of identifying program units that are buggy from two sets of execution traces: normal traces and faulty traces. These approaches use statistical formulas to measure the suspiciousness of program units based on the execution traces. There have been many spectrum-based fault localization approaches proposing various formulas in the literature. Two of the best performing and well-known ones are Tarantula and Ochiai. Recently, Xie et al. find that theoretically, under certain assumptions, two families of spectrum-based fault localization formulas outperform all other formulas including those of Tarantula and Ochiai. In this work, we empirically …


R-Energy For Evaluating Robustness Of Dynamic Networks, Ming Gao, Ee Peng Lim, David Lo Jun 2014

R-Energy For Evaluating Robustness Of Dynamic Networks, Ming Gao, Ee Peng Lim, David Lo

David LO

The robustness of a network is determined by how well its vertices are connected to one another so as to keep the network strong and sustainable. As the network evolves its robustness changes and may reveal events as well as periodic trend patterns that affect the interactions among users in the network. In this paper, we develop R-energy as a new measure of network robustness based on the spectral analysis of normalized Laplacian matrix. R-energy can cope with disconnected networks, and is efficient to compute with a time complexity of O (jV j + jEj) where V and E are …


Accurate Developer Recommendation For Bug Resolution, Xin Xia, David Lo, Xinyu Wang, Bo Zhou Jun 2014

Accurate Developer Recommendation For Bug Resolution, Xin Xia, David Lo, Xinyu Wang, Bo Zhou

David LO

Bug resolution refers to the activity that developers perform to diagnose, fix, test, and document bugs during software development and maintenance. It is a collaborative activity among developers who contribute their knowledge, ideas, and expertise to resolve bugs. Given a bug report, we would like to recommend the set of bug resolvers that could potentially contribute their knowledge to fix it. We refer to this problem as developer recommendation for bug resolution. In this paper, we propose a new and accurate method named DevRec for the developer recommendation problem. DevRec is a composite method which performs two kinds of analysis: …


An Automated Approach For Finding Variable-Constant Pairing Bugs, Julia Lawall, David Lo Jun 2014

An Automated Approach For Finding Variable-Constant Pairing Bugs, Julia Lawall, David Lo

David LO

Named constants are used heavily in operating systems code, both as internal ags and in interactions with devices. Decision making within an operating system thus critically depends on the correct usage of these values. Nevertheless, compilers for the languages typically used in implementing operating systems provide little support for checking the usage of named constants. This affects correctness, when a constant is used in a context where its value is meaningless, and software maintenance, when a constant has the right value for its usage context but the wrong name. We propose a hybrid program-analysis and data-mining based approach to identify …


Extended Comprehensive Study Of Association Measures For Fault Localization, Lucia Lucia, David Lo, Lingxiao Jiang, Ferdian Thung, Aditya Budi Jun 2014

Extended Comprehensive Study Of Association Measures For Fault Localization, Lucia Lucia, David Lo, Lingxiao Jiang, Ferdian Thung, Aditya Budi

David LO

Spectrum-based fault localization is a promising approach to automatically locate root causes of failures quickly. Two well-known spectrum-based fault localization techniques, Tarantula and Ochiai, measure how likely a program element is a root cause of failures based on profiles of correct and failed program executions. These techniques are conceptually similar to association measures that have been proposed in statistics, data mining, and have been utilized to quantify the relationship strength between two variables of interest (e.g., the use of a medicine and the cure rate of a disease). In this paper, we view fault localization as a measurement of the …


Automated Library Recommendation, Ferdian Thung, David Lo, Julia Lawall Jun 2014

Automated Library Recommendation, Ferdian Thung, David Lo, Julia Lawall

David LO

Many third party libraries are available to be downloaded and used. Using such libraries can reduce development time and make the developed software more reliable. However, developers are often unaware of suitable libraries to be used for their projects and thus they miss out on these benefits. To help developers better take advantage of the available libraries, we propose a new technique that automatically recommends libraries to developers. Our technique takes as input the set of libraries that an application currently uses, and recommends other libraries that are likely to be relevant. We follow a hybrid approach that combines association …


Boat: An Experimental Platform For Researchers To Comparatively And Reproducibly Evaluate Bug Localization Techniques, Xinyu Wang, David Lo, Xin Xia, Xingen Wang, Pavneet Singh Kochhar, Yuan Tian, Xiaohu Yang, Shanping Li, Jianling Sun, Bo Zhou Jun 2014

Boat: An Experimental Platform For Researchers To Comparatively And Reproducibly Evaluate Bug Localization Techniques, Xinyu Wang, David Lo, Xin Xia, Xingen Wang, Pavneet Singh Kochhar, Yuan Tian, Xiaohu Yang, Shanping Li, Jianling Sun, Bo Zhou

David LO

Bug localization refers to the process of identifying source code files that contain defects from descriptions of these defects which are typically contained in bug reports. There have been many bug localization techniques proposed in the literature. However, often it is hard to compare these techniques since different evaluation datasets are used. At times the datasets are not made publicly available and thus it is difficult to reproduce reported results. Furthermore, some techniques are only evaluated on small datasets and thus it is not clear whether the results are generalizable. Thus, there is a need for a platform that allows …


F-Trail: Finding Patterns In Taxi Trajectories, Yasuko Matsubara, Evangelos Papalexakis, Lei Li, David Lo, Yasushi Sakurai, Christos Faloutsos Apr 2013

F-Trail: Finding Patterns In Taxi Trajectories, Yasuko Matsubara, Evangelos Papalexakis, Lei Li, David Lo, Yasushi Sakurai, Christos Faloutsos

David LO

Given a large number of taxi trajectories, we would like to find interesting and unexpected patterns from the data. How can we summarize the major trends, and how can we spot anomalies? The analysis of trajectories has been an issue of considerable interest with many applications such as tracking trails of migrating animals and predicting the path of hurricanes. Several recent works propose methods on clustering and indexing trajectories data. However, these approaches are not especially well suited to pattern discovery with respect to the dynamics of social and economic behavior. To further analyze a huge collection of taxi trajectories, …


Empirical Evaluation Of Bug Linking, Tegawendé F. Bissyande, Ferdian Thung, Shaowei Wang, David Lo, Lingxiao Jiang, Laurent Réveillère Apr 2013

Empirical Evaluation Of Bug Linking, Tegawendé F. Bissyande, Ferdian Thung, Shaowei Wang, David Lo, Lingxiao Jiang, Laurent Réveillère

David LO

To collect software bugs found by users, development teams often setup bug trackers using systems such as Bugzilla. Developers would then fix some of the bugs and commit corresponding code changes into version control systems such as svn or git. Unfortunately, the links between bug reports and code changes are missing for many software projects as the bug tracking and version control systems are often maintained separately. Yet, linking bug reports to fix commits is important as it could shed light into the nature of bug fixing processes and expose patterns in software management. Bug linking solutions, such as ReLink, …


Automatic Defect Categorization, Ferdian Thung, David Lo, Lingxiao Jiang Apr 2013

Automatic Defect Categorization, Ferdian Thung, David Lo, Lingxiao Jiang

David LO

Defects are prevalent in software systems. In order to understand defects better, industry practitioners often categorize bugs into various types. One common kind of categorization is the IBM’s Orthogonal Defect Classification (ODC). ODC proposes various orthogonal classification of defects based on much information about the defects, such as the symptoms and semantics of the defects, the root cause analysis of the defects, and many more. With these category labels, developers can better perform post-mortem analysis to find out what the common characteristics of the defects that plague a particular software project are. Albeit the benefits of having these categories, for …


A Comparative Study Of Supervised Learning Algorithms For Re-Opened Bug Prediction, Xin Xia, David Lo, Xinyu Wang, Xiaohu Yang, Shanping Li, Jianling Sun Apr 2013

A Comparative Study Of Supervised Learning Algorithms For Re-Opened Bug Prediction, Xin Xia, David Lo, Xinyu Wang, Xiaohu Yang, Shanping Li, Jianling Sun

David LO

Bug fixing is a time-consuming and costly job which is performed in the whole life cycle of software development and maintenance. For many systems, bugs are managed in bug management systems such as Bugzilla. Generally, the status of a typical bug report in Bugzilla changes from new to assigned, verified and closed. However, some bugs have to be reopened. Reopened bugs increase the software development and maintenance cost, increase the workload of bug fixers, and might even delay the future delivery of a software. Only a few studies investigate the phenomenon of reopened bug reports. In this paper, we evaluate …


An Empirical Study On Developer Interactions In Stackoverflow, Shaowei Wang, David Lo, Lingxiao Jiang Apr 2013

An Empirical Study On Developer Interactions In Stackoverflow, Shaowei Wang, David Lo, Lingxiao Jiang

David LO

No abstract provided.


Understanding Widespread Changes: A Taxonomic Study, Shaowei Wang, David Lo, Lingxiao Jiang Apr 2013

Understanding Widespread Changes: A Taxonomic Study, Shaowei Wang, David Lo, Lingxiao Jiang

David LO

Many active research studies in software engineering, such as detection of recurring bug fixes, detection of copyand- paste bugs, and automated program transformation tools, are motivated by the assumption that many code changes (e.g., changing an identifier name) in software systems are widespread to many locations and are similar to one another. However, there is no study so far that actually analyzes widespread changes in software systems. Understanding the nature of widespread changes could empirically support the assumption, which provides insight to improve the research studies and related tools. Our study in this paper addresses such a need. We propose …


Diffusion Of Software Features: An Exploratory Study, Ferdian Thung, David Lo, Lingxiao Jiang Apr 2013

Diffusion Of Software Features: An Exploratory Study, Ferdian Thung, David Lo, Lingxiao Jiang

David LO

New features are frequently proposed in many software libraries. These features include new methods, classes, packages, etc. These features are utilized in many open source and commercial software systems. Some of these features are adopted very quickly, while others take a long time to be adopted. Each feature takes much resource to develop, test, and document. Library developers and managers need to decide what feature to prioritize and what to develop next. As a first step to aid these stakeholders, we perform an exploratory study on the diffusion or rate of adoption of features in Java Development Kit (JDK) library. …


Predicting Project Outcome Leveraging Socio-Technical Network Patterns, Didi Surian, Yuan Tian, David Lo, Hong Cheng, Ee Peng Lim Apr 2013

Predicting Project Outcome Leveraging Socio-Technical Network Patterns, Didi Surian, Yuan Tian, David Lo, Hong Cheng, Ee Peng Lim

David LO

There are many software projects started daily, some are successful, while others are not. Successful projects get completed, are used by many people, and bring benefits to users. Failed projects do not bring similar benefits. In this work, we are interested in developing an effective machine learning solution that predicts project outcome (i.e., success or failures) from developer socio-technical network. To do so, we investigate successful and failed projects to find factors that differentiate the two. We analyze the socio-technical aspect of the software development process by focusing at the people that contribute to these projects and the interactions among …


Network Structure Of Social Coding In Github, Ferdian Thung, Tegawende F. Bissyande, David Lo, Lingxiao Jiang Apr 2013

Network Structure Of Social Coding In Github, Ferdian Thung, Tegawende F. Bissyande, David Lo, Lingxiao Jiang

David LO

Social coding enables a different experience of software development as the activities and interests of one developer are easily advertized to other developers. Developers can thus track the activities relevant to various projects in one umbrella site. Such a major change in collaborative software development makes an investigation of networkings on social coding sites valuable. Furthermore, project hosting platforms promoting this development paradigm have been thriving, among which GitHub has arguably gained the most momentum. In this paper, we contribute to the body of knowledge on social coding by investigating the network structure of social coding in GitHub. We collect …


Adoption Of Software Testing In Open Source Projects: A Preliminary Study On 50,000 Projects, Pavneet Singh Kochhar, Tegawende F. Bissyande, David Lo, Lingxiao Jiang Apr 2013

Adoption Of Software Testing In Open Source Projects: A Preliminary Study On 50,000 Projects, Pavneet Singh Kochhar, Tegawende F. Bissyande, David Lo, Lingxiao Jiang

David LO

In software engineering, testing is a crucial activity that is designed to ensure the quality of program code. For this activity, development teams spend substantial resources constructing test cases to thoroughly assess the correctness of software functionality. What is however the proportion of open source projects that include test cases? What kind of projects are more likely to include test cases? In this study, we explore 50,000 projects and investigate the correlation between the presence of test cases and various project development characteristics, including the lines of code and the size of development teams.


Semantically Related Software Terms And Their Taxonomy By Leveraging Collaborative Tagging, Shaowei Wang, David Lo, Lingxiao Jiang Dec 2012

Semantically Related Software Terms And Their Taxonomy By Leveraging Collaborative Tagging, Shaowei Wang, David Lo, Lingxiao Jiang

David LO

Millions of people, including those in the software engineering communities have turned to microblogging services, such as Twitter, as a means to quickly disseminate information. A number of past studies by Treude et al., Storey, and Yuan et al. have shown that a wealth of interesting information is stored in these microblogs. However, microblogs also contain a large amount of noisy content that are less relevant to software developers in engineering software systems. In this work, we perform a preliminary study to investigate the feasibility of automatic classification of microblogs into two categories: relevant and irrelevant to engineering software systems. …


When Would This Bug Get Reported?, Ferdian Thung, David Lo, Lingxiao Jiang, Lucia Lucia, Foyzur Rahman, Premkumar Devanbu Dec 2012

When Would This Bug Get Reported?, Ferdian Thung, David Lo, Lingxiao Jiang, Lucia Lucia, Foyzur Rahman, Premkumar Devanbu

David LO

Millions of people, including those in the software engineering communities have turned to microblogging services, such as Twitter, as a means to quickly disseminate information. A number of past studies by Treude et al., Storey, and Yuan et al. have shown that a wealth of interesting information is stored in these microblogs. However, microblogs also contain a large amount of noisy content that are less relevant to software developers in engineering software systems. In this work, we perform a preliminary study to investigate the feasibility of automatic classification of microblogs into two categories: relevant and irrelevant to engineering software systems. …


Mining Indirect Antagonistic Communities From Social Interactions, Kuan Zhang, David Lo, Ee Peng Lim, Philips Kokoh Prasetyo Dec 2012

Mining Indirect Antagonistic Communities From Social Interactions, Kuan Zhang, David Lo, Ee Peng Lim, Philips Kokoh Prasetyo

David LO

Antagonistic communities refer to groups of people with opposite tastes, opinions, and factions within a community. Given a set of interactions among people in a community, we develop a novel pattern mining approach to mine a set of antagonistic communities. In particular, based on a set of user-specified thresholds, we extract a set of pairs of communities that behave in opposite ways with one another. We focus on extracting a compact lossless representation based on the concept of closed patterns to prevent exploding the number of mined antagonistic communities. We also present a variation of the algorithm using a divide …


Searching Connected Api Subgraph Via Text Phrases, Wing-Kwan Chan, Hong Cheng, David Lo Dec 2012

Searching Connected Api Subgraph Via Text Phrases, Wing-Kwan Chan, Hong Cheng, David Lo

David LO

Reusing APIs of existing libraries is a common practice during software development, but searching suitable APIs and their usages can be time-consuming [6]. In this paper, we study a new and more practical approach to help users find usages of APIs given only simple text phrases, when users have limited knowledge about an API library. We model API invocations as an API graph and aim to find an optimum connected subgraph that meets users' search needs. The problem is challenging since the search space in an API graph is very huge. We start with a greedy subgraph search algorithm which …


Learning Extended Fsa From Software: An Empirical Assessment, David Lo, Leonardo Mariani, Mauro Santoro Dec 2012

Learning Extended Fsa From Software: An Empirical Assessment, David Lo, Leonardo Mariani, Mauro Santoro

David LO

A number of techniques that infer finite state automata from execution traces have been used to support test and analysis activities. Some of these techniques can produce automata that integrate information about the data-flow, that is, they also represent how data values affect the operations executed by programs. The integration of information about operation sequences and data values into a unique model is indeed conceptually useful to accurately represent the behavior of a program. However, it is still unclear whether handling heterogeneous types of information, such as operation sequences and data values, necessarily produces higher quality models or not. In …


Scenario-Based And Value-Based Specification Mining: Better Together, David Lo, Shahar Maoz Dec 2012

Scenario-Based And Value-Based Specification Mining: Better Together, David Lo, Shahar Maoz

David LO

Specification mining takes execution traces as input and extracts likely program invariants, which can be used for comprehension, verification, and evolution related tasks. In this work we integrate scenario-based specification mining, which uses a data-mining algorithm to suggest ordering constraints in the form of live sequence charts, an inter-object, visual, modal, scenario-based specification language, with mining of value-based invariants, which detects likely invariants holding at specific program points. The key to the integration is a technique we call scenario-based slicing, running on top of the mining algorithms to distinguish the scenario-specific invariants from the general ones. The resulting suggested specifications …


Observatory Of Trends In Software Related Microblogs, Achananuparp Palakorn, Nelman Lubis Ibrahim, Yuan Tian, David Lo, Ee Peng Lim Dec 2012

Observatory Of Trends In Software Related Microblogs, Achananuparp Palakorn, Nelman Lubis Ibrahim, Yuan Tian, David Lo, Ee Peng Lim

David LO

Microblogging has recently become a popular means to disseminate information among millions of people. Interestingly, software developers also use microblog to communicate with one another. Different from traditional media, microblog users tend to focus on recency and informality of content. Many tweet contents are relatively more personal and Opinionated, compared to that of traditional news report. Thus, by analyzing microblogs, one could get the up-to-date information about what people are interested in or feel toward a particular topic. In this paper, we describe our microblog observatory that aggregates more than 70,000 Twitter feeds, captures software-related tweets, and computes trends from …


Kbe-Anonymity: Test Data Anonymization For Evolving Programs, - Lucia, David Lo, Lingxiao Jiang, Aditya Budi Dec 2012

Kbe-Anonymity: Test Data Anonymization For Evolving Programs, - Lucia, David Lo, Lingxiao Jiang, Aditya Budi

David LO

High-quality test data that is useful for effective testing is often available on users’ site. However, sharing data owned by users with software vendors may raise privacy concerns. Techniques are needed to enable data sharing among data owners and the vendors without leaking data privacy. Evolving programs bring additional challenges because data may be shared multiple times for every version of a program. When multiple versions of the data are cross-referenced, private information could be inferred. Although there are studies addressing the privacy issue of data sharing for testing and debugging, little work has explicitly addressed the challenges when programs …


To What Extent Could We Detect Field Defects? —An Empirical Study Of False Negatives In Static Bug Finding Tools, Ferdian Thung, - Lucia, David Lo, Lingxiao Jiang, Premkumar Devanbu, Foyzur Rahman Dec 2012

To What Extent Could We Detect Field Defects? —An Empirical Study Of False Negatives In Static Bug Finding Tools, Ferdian Thung, - Lucia, David Lo, Lingxiao Jiang, Premkumar Devanbu, Foyzur Rahman

David LO

Software defects can cause much loss. Static bug-finding tools are believed to help detect and remove defects. These tools are designed to find programming errors; but, do they in fact help prevent actual defects that occur in the field and reported by users? If these tools had been used, would they have detected these field defects, and generated warnings that would direct programmers to fix them? To answer these questions, we perform an empirical study that investigates the effectiveness of state-of-the-art static bug finding tools on hundreds of reported and fixed defects extracted from three open source programs: Lucene, Rhino, …


Information Retrieval Based Nearest Neighbor Classification For Fine-Grained Bug Severity Prediction, Yuan Tian, David Lo, Chengnian Sun Dec 2012

Information Retrieval Based Nearest Neighbor Classification For Fine-Grained Bug Severity Prediction, Yuan Tian, David Lo, Chengnian Sun

David LO

Bugs are prevalent in software systems. Some bugs are critical and need to be fixed right away, whereas others are minor and their fixes could be postponed until resources are available. In this work, we propose a new approach leveraging information retrieval, in particular BM25-based document similarity function, to automatically predict the severity of bug reports. Our approach automatically analyzes bug reports reported in the past along with their assigned severity labels, and recommends severity labels to newly reported bug reports. Duplicate bug reports are utilized to determine what bug report features, be it textual, ordinal, or categorical, are important. …


Semantic Patch Inference, Jesper Abdersen, Anh Cuong Nguyen, David Lo, Julia Lawall, Siau-Cheng Khoo Dec 2012

Semantic Patch Inference, Jesper Abdersen, Anh Cuong Nguyen, David Lo, Julia Lawall, Siau-Cheng Khoo

David LO

We propose a tool for inferring transformation specifications from a few examples of original and updated code. These transformation specifications may contain multiple code fragments from within a single function, all of which must be present for the transformation to apply. This makes the inferred transformations context sensitive. Our algorithm is based on depth-first search, with pruning. Because it is applied locally to a collection of functions that contain related changes, it is efficient in practice. We illustrate the approach on an example drawn from recent changes to the Linux kernel.


An Empirical Study Of Bugs In Machine Learning Systems, Ferdian Thung, Shaowei Wang, David Lo, Lingxiao Jiang Dec 2012

An Empirical Study Of Bugs In Machine Learning Systems, Ferdian Thung, Shaowei Wang, David Lo, Lingxiao Jiang

David LO

Many machine learning systems that include various data mining, information retrieval, and natural language processing code and libraries have being used in real world applications. Search engines, internet advertising systems, product recommendation systems are sample users of these algorithm intensive code and libraries. Machine learning code and toolkits have also been used in many recent studies on software mining and analytics that aim to automate various software engineering tasks. With the increasing number of important applications of machine learning systems, the reliability of such systems is also becoming increasingly important. A necessary step for ensuring reliability of such systems is …


Duplicate Bug Report Detection With A Combination Of Information Retrieval And Topic Modeling, Anh Tuan Nguyen, Tung Nguyen, Tien Nguyen, David Lo, Chengnian Sun Dec 2012

Duplicate Bug Report Detection With A Combination Of Information Retrieval And Topic Modeling, Anh Tuan Nguyen, Tung Nguyen, Tien Nguyen, David Lo, Chengnian Sun

David LO

Detecting duplicate bug reports helps reduce triaging efforts and save time for developers in fixing the same issues. Among several automated detection approaches, text-based information retrieval (IR) approaches have been shown to outperform others in term of both accuracy and time efficiency. However, those IR-based approaches do not detect well the duplicate reports on the same technical issues written in different descriptive terms. This paper introduces DBTM, a duplicate bug report detection approach that takes advantage of both IR-based features and topic-based features. DBTM models a bug report as a textual document describing certain technical issue(s), and models duplicate bug …