Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Discipline
-
- Software Engineering (115)
- Databases and Information Systems (28)
- Numerical Analysis and Scientific Computing (12)
- Social and Behavioral Sciences (7)
- OS and Networks (6)
-
- Communication (5)
- Programming Languages and Compilers (5)
- Communication Technology and New Media (4)
- Systems Architecture (4)
- Theory and Algorithms (4)
- Artificial Intelligence and Robotics (3)
- Graphics and Human Computer Interfaces (2)
- Information Security (1)
- Library and Information Science (1)
- Public Affairs, Public Policy and Public Administration (1)
- Social Media (1)
- Transportation (1)
- Keyword
-
- Fault Localization (4)
- Live sequence charts (4)
- Specification mining (4)
- Data mining (3)
- Topic Model (3)
-
- Adequacy (2)
- Behavior preservation (2)
- Dynamic analysis (2)
- Empirical Study (2)
- Empirical study (2)
- Frequent pattern mining (2)
- Iterative patterns (2)
- Java (2)
- K-anonymity (2)
- Object hierarchy (2)
- Program Spectra (2)
- Reverse engineering (2)
- Social networks (2)
- Software engineering (2)
- Software testing (2)
- Statistical analysis (2)
- Test cases (2)
- Uml sequence diagram (2)
- program comprehension. (1)
- Action list (1)
- Algorithms (1)
- Anomaly Detection (1)
- Antagonistic group (1)
- Application program interfaces (1)
- Association Measures (1)
- File Type
Articles 31 - 60 of 126
Full-Text Articles in Physical Sciences and Mathematics
Theory And Practice, Do They Match? A Case With Spectrum-Based Fault Localization, Tien-Duy B. Le, Ferdian Thung, David Lo
Theory And Practice, Do They Match? A Case With Spectrum-Based Fault Localization, Tien-Duy B. Le, Ferdian Thung, David Lo
David LO
Spectrum-based fault localization refers to the process of identifying program units that are buggy from two sets of execution traces: normal traces and faulty traces. These approaches use statistical formulas to measure the suspiciousness of program units based on the execution traces. There have been many spectrum-based fault localization approaches proposing various formulas in the literature. Two of the best performing and well-known ones are Tarantula and Ochiai. Recently, Xie et al. find that theoretically, under certain assumptions, two families of spectrum-based fault localization formulas outperform all other formulas including those of Tarantula and Ochiai. In this work, we empirically …
R-Energy For Evaluating Robustness Of Dynamic Networks, Ming Gao, Ee Peng Lim, David Lo
R-Energy For Evaluating Robustness Of Dynamic Networks, Ming Gao, Ee Peng Lim, David Lo
David LO
The robustness of a network is determined by how well its vertices are connected to one another so as to keep the network strong and sustainable. As the network evolves its robustness changes and may reveal events as well as periodic trend patterns that affect the interactions among users in the network. In this paper, we develop R-energy as a new measure of network robustness based on the spectral analysis of normalized Laplacian matrix. R-energy can cope with disconnected networks, and is efficient to compute with a time complexity of O (jV j + jEj) where V and E are …
Accurate Developer Recommendation For Bug Resolution, Xin Xia, David Lo, Xinyu Wang, Bo Zhou
Accurate Developer Recommendation For Bug Resolution, Xin Xia, David Lo, Xinyu Wang, Bo Zhou
David LO
Bug resolution refers to the activity that developers perform to diagnose, fix, test, and document bugs during software development and maintenance. It is a collaborative activity among developers who contribute their knowledge, ideas, and expertise to resolve bugs. Given a bug report, we would like to recommend the set of bug resolvers that could potentially contribute their knowledge to fix it. We refer to this problem as developer recommendation for bug resolution. In this paper, we propose a new and accurate method named DevRec for the developer recommendation problem. DevRec is a composite method which performs two kinds of analysis: …
An Automated Approach For Finding Variable-Constant Pairing Bugs, Julia Lawall, David Lo
An Automated Approach For Finding Variable-Constant Pairing Bugs, Julia Lawall, David Lo
David LO
Named constants are used heavily in operating systems code, both as internal ags and in interactions with devices. Decision making within an operating system thus critically depends on the correct usage of these values. Nevertheless, compilers for the languages typically used in implementing operating systems provide little support for checking the usage of named constants. This affects correctness, when a constant is used in a context where its value is meaningless, and software maintenance, when a constant has the right value for its usage context but the wrong name. We propose a hybrid program-analysis and data-mining based approach to identify …
Extended Comprehensive Study Of Association Measures For Fault Localization, Lucia Lucia, David Lo, Lingxiao Jiang, Ferdian Thung, Aditya Budi
Extended Comprehensive Study Of Association Measures For Fault Localization, Lucia Lucia, David Lo, Lingxiao Jiang, Ferdian Thung, Aditya Budi
David LO
Spectrum-based fault localization is a promising approach to automatically locate root causes of failures quickly. Two well-known spectrum-based fault localization techniques, Tarantula and Ochiai, measure how likely a program element is a root cause of failures based on profiles of correct and failed program executions. These techniques are conceptually similar to association measures that have been proposed in statistics, data mining, and have been utilized to quantify the relationship strength between two variables of interest (e.g., the use of a medicine and the cure rate of a disease). In this paper, we view fault localization as a measurement of the …
Automated Library Recommendation, Ferdian Thung, David Lo, Julia Lawall
Automated Library Recommendation, Ferdian Thung, David Lo, Julia Lawall
David LO
Many third party libraries are available to be downloaded and used. Using such libraries can reduce development time and make the developed software more reliable. However, developers are often unaware of suitable libraries to be used for their projects and thus they miss out on these benefits. To help developers better take advantage of the available libraries, we propose a new technique that automatically recommends libraries to developers. Our technique takes as input the set of libraries that an application currently uses, and recommends other libraries that are likely to be relevant. We follow a hybrid approach that combines association …
Boat: An Experimental Platform For Researchers To Comparatively And Reproducibly Evaluate Bug Localization Techniques, Xinyu Wang, David Lo, Xin Xia, Xingen Wang, Pavneet Singh Kochhar, Yuan Tian, Xiaohu Yang, Shanping Li, Jianling Sun, Bo Zhou
Boat: An Experimental Platform For Researchers To Comparatively And Reproducibly Evaluate Bug Localization Techniques, Xinyu Wang, David Lo, Xin Xia, Xingen Wang, Pavneet Singh Kochhar, Yuan Tian, Xiaohu Yang, Shanping Li, Jianling Sun, Bo Zhou
David LO
Bug localization refers to the process of identifying source code files that contain defects from descriptions of these defects which are typically contained in bug reports. There have been many bug localization techniques proposed in the literature. However, often it is hard to compare these techniques since different evaluation datasets are used. At times the datasets are not made publicly available and thus it is difficult to reproduce reported results. Furthermore, some techniques are only evaluated on small datasets and thus it is not clear whether the results are generalizable. Thus, there is a need for a platform that allows …
F-Trail: Finding Patterns In Taxi Trajectories, Yasuko Matsubara, Evangelos Papalexakis, Lei Li, David Lo, Yasushi Sakurai, Christos Faloutsos
F-Trail: Finding Patterns In Taxi Trajectories, Yasuko Matsubara, Evangelos Papalexakis, Lei Li, David Lo, Yasushi Sakurai, Christos Faloutsos
David LO
Given a large number of taxi trajectories, we would like to find interesting and unexpected patterns from the data. How can we summarize the major trends, and how can we spot anomalies? The analysis of trajectories has been an issue of considerable interest with many applications such as tracking trails of migrating animals and predicting the path of hurricanes. Several recent works propose methods on clustering and indexing trajectories data. However, these approaches are not especially well suited to pattern discovery with respect to the dynamics of social and economic behavior. To further analyze a huge collection of taxi trajectories, …
Empirical Evaluation Of Bug Linking, Tegawendé F. Bissyande, Ferdian Thung, Shaowei Wang, David Lo, Lingxiao Jiang, Laurent Réveillère
Empirical Evaluation Of Bug Linking, Tegawendé F. Bissyande, Ferdian Thung, Shaowei Wang, David Lo, Lingxiao Jiang, Laurent Réveillère
David LO
To collect software bugs found by users, development teams often setup bug trackers using systems such as Bugzilla. Developers would then fix some of the bugs and commit corresponding code changes into version control systems such as svn or git. Unfortunately, the links between bug reports and code changes are missing for many software projects as the bug tracking and version control systems are often maintained separately. Yet, linking bug reports to fix commits is important as it could shed light into the nature of bug fixing processes and expose patterns in software management. Bug linking solutions, such as ReLink, …
Automatic Defect Categorization, Ferdian Thung, David Lo, Lingxiao Jiang
Automatic Defect Categorization, Ferdian Thung, David Lo, Lingxiao Jiang
David LO
Defects are prevalent in software systems. In order to understand defects better, industry practitioners often categorize bugs into various types. One common kind of categorization is the IBM’s Orthogonal Defect Classification (ODC). ODC proposes various orthogonal classification of defects based on much information about the defects, such as the symptoms and semantics of the defects, the root cause analysis of the defects, and many more. With these category labels, developers can better perform post-mortem analysis to find out what the common characteristics of the defects that plague a particular software project are. Albeit the benefits of having these categories, for …
A Comparative Study Of Supervised Learning Algorithms For Re-Opened Bug Prediction, Xin Xia, David Lo, Xinyu Wang, Xiaohu Yang, Shanping Li, Jianling Sun
A Comparative Study Of Supervised Learning Algorithms For Re-Opened Bug Prediction, Xin Xia, David Lo, Xinyu Wang, Xiaohu Yang, Shanping Li, Jianling Sun
David LO
Bug fixing is a time-consuming and costly job which is performed in the whole life cycle of software development and maintenance. For many systems, bugs are managed in bug management systems such as Bugzilla. Generally, the status of a typical bug report in Bugzilla changes from new to assigned, verified and closed. However, some bugs have to be reopened. Reopened bugs increase the software development and maintenance cost, increase the workload of bug fixers, and might even delay the future delivery of a software. Only a few studies investigate the phenomenon of reopened bug reports. In this paper, we evaluate …
An Empirical Study On Developer Interactions In Stackoverflow, Shaowei Wang, David Lo, Lingxiao Jiang
An Empirical Study On Developer Interactions In Stackoverflow, Shaowei Wang, David Lo, Lingxiao Jiang
David LO
No abstract provided.
Understanding Widespread Changes: A Taxonomic Study, Shaowei Wang, David Lo, Lingxiao Jiang
Understanding Widespread Changes: A Taxonomic Study, Shaowei Wang, David Lo, Lingxiao Jiang
David LO
Many active research studies in software engineering, such as detection of recurring bug fixes, detection of copyand- paste bugs, and automated program transformation tools, are motivated by the assumption that many code changes (e.g., changing an identifier name) in software systems are widespread to many locations and are similar to one another. However, there is no study so far that actually analyzes widespread changes in software systems. Understanding the nature of widespread changes could empirically support the assumption, which provides insight to improve the research studies and related tools. Our study in this paper addresses such a need. We propose …
Diffusion Of Software Features: An Exploratory Study, Ferdian Thung, David Lo, Lingxiao Jiang
Diffusion Of Software Features: An Exploratory Study, Ferdian Thung, David Lo, Lingxiao Jiang
David LO
New features are frequently proposed in many software libraries. These features include new methods, classes, packages, etc. These features are utilized in many open source and commercial software systems. Some of these features are adopted very quickly, while others take a long time to be adopted. Each feature takes much resource to develop, test, and document. Library developers and managers need to decide what feature to prioritize and what to develop next. As a first step to aid these stakeholders, we perform an exploratory study on the diffusion or rate of adoption of features in Java Development Kit (JDK) library. …
Predicting Project Outcome Leveraging Socio-Technical Network Patterns, Didi Surian, Yuan Tian, David Lo, Hong Cheng, Ee Peng Lim
Predicting Project Outcome Leveraging Socio-Technical Network Patterns, Didi Surian, Yuan Tian, David Lo, Hong Cheng, Ee Peng Lim
David LO
There are many software projects started daily, some are successful, while others are not. Successful projects get completed, are used by many people, and bring benefits to users. Failed projects do not bring similar benefits. In this work, we are interested in developing an effective machine learning solution that predicts project outcome (i.e., success or failures) from developer socio-technical network. To do so, we investigate successful and failed projects to find factors that differentiate the two. We analyze the socio-technical aspect of the software development process by focusing at the people that contribute to these projects and the interactions among …
Network Structure Of Social Coding In Github, Ferdian Thung, Tegawende F. Bissyande, David Lo, Lingxiao Jiang
Network Structure Of Social Coding In Github, Ferdian Thung, Tegawende F. Bissyande, David Lo, Lingxiao Jiang
David LO
Social coding enables a different experience of software development as the activities and interests of one developer are easily advertized to other developers. Developers can thus track the activities relevant to various projects in one umbrella site. Such a major change in collaborative software development makes an investigation of networkings on social coding sites valuable. Furthermore, project hosting platforms promoting this development paradigm have been thriving, among which GitHub has arguably gained the most momentum. In this paper, we contribute to the body of knowledge on social coding by investigating the network structure of social coding in GitHub. We collect …
Adoption Of Software Testing In Open Source Projects: A Preliminary Study On 50,000 Projects, Pavneet Singh Kochhar, Tegawende F. Bissyande, David Lo, Lingxiao Jiang
Adoption Of Software Testing In Open Source Projects: A Preliminary Study On 50,000 Projects, Pavneet Singh Kochhar, Tegawende F. Bissyande, David Lo, Lingxiao Jiang
David LO
In software engineering, testing is a crucial activity that is designed to ensure the quality of program code. For this activity, development teams spend substantial resources constructing test cases to thoroughly assess the correctness of software functionality. What is however the proportion of open source projects that include test cases? What kind of projects are more likely to include test cases? In this study, we explore 50,000 projects and investigate the correlation between the presence of test cases and various project development characteristics, including the lines of code and the size of development teams.
Semantically Related Software Terms And Their Taxonomy By Leveraging Collaborative Tagging, Shaowei Wang, David Lo, Lingxiao Jiang
Semantically Related Software Terms And Their Taxonomy By Leveraging Collaborative Tagging, Shaowei Wang, David Lo, Lingxiao Jiang
David LO
Millions of people, including those in the software engineering communities have turned to microblogging services, such as Twitter, as a means to quickly disseminate information. A number of past studies by Treude et al., Storey, and Yuan et al. have shown that a wealth of interesting information is stored in these microblogs. However, microblogs also contain a large amount of noisy content that are less relevant to software developers in engineering software systems. In this work, we perform a preliminary study to investigate the feasibility of automatic classification of microblogs into two categories: relevant and irrelevant to engineering software systems. …
When Would This Bug Get Reported?, Ferdian Thung, David Lo, Lingxiao Jiang, Lucia Lucia, Foyzur Rahman, Premkumar Devanbu
When Would This Bug Get Reported?, Ferdian Thung, David Lo, Lingxiao Jiang, Lucia Lucia, Foyzur Rahman, Premkumar Devanbu
David LO
Millions of people, including those in the software engineering communities have turned to microblogging services, such as Twitter, as a means to quickly disseminate information. A number of past studies by Treude et al., Storey, and Yuan et al. have shown that a wealth of interesting information is stored in these microblogs. However, microblogs also contain a large amount of noisy content that are less relevant to software developers in engineering software systems. In this work, we perform a preliminary study to investigate the feasibility of automatic classification of microblogs into two categories: relevant and irrelevant to engineering software systems. …
Mining Indirect Antagonistic Communities From Social Interactions, Kuan Zhang, David Lo, Ee Peng Lim, Philips Kokoh Prasetyo
Mining Indirect Antagonistic Communities From Social Interactions, Kuan Zhang, David Lo, Ee Peng Lim, Philips Kokoh Prasetyo
David LO
Antagonistic communities refer to groups of people with opposite tastes, opinions, and factions within a community. Given a set of interactions among people in a community, we develop a novel pattern mining approach to mine a set of antagonistic communities. In particular, based on a set of user-specified thresholds, we extract a set of pairs of communities that behave in opposite ways with one another. We focus on extracting a compact lossless representation based on the concept of closed patterns to prevent exploding the number of mined antagonistic communities. We also present a variation of the algorithm using a divide …
Searching Connected Api Subgraph Via Text Phrases, Wing-Kwan Chan, Hong Cheng, David Lo
Searching Connected Api Subgraph Via Text Phrases, Wing-Kwan Chan, Hong Cheng, David Lo
David LO
Reusing APIs of existing libraries is a common practice during software development, but searching suitable APIs and their usages can be time-consuming [6]. In this paper, we study a new and more practical approach to help users find usages of APIs given only simple text phrases, when users have limited knowledge about an API library. We model API invocations as an API graph and aim to find an optimum connected subgraph that meets users' search needs. The problem is challenging since the search space in an API graph is very huge. We start with a greedy subgraph search algorithm which …
Learning Extended Fsa From Software: An Empirical Assessment, David Lo, Leonardo Mariani, Mauro Santoro
Learning Extended Fsa From Software: An Empirical Assessment, David Lo, Leonardo Mariani, Mauro Santoro
David LO
A number of techniques that infer finite state automata from execution traces have been used to support test and analysis activities. Some of these techniques can produce automata that integrate information about the data-flow, that is, they also represent how data values affect the operations executed by programs. The integration of information about operation sequences and data values into a unique model is indeed conceptually useful to accurately represent the behavior of a program. However, it is still unclear whether handling heterogeneous types of information, such as operation sequences and data values, necessarily produces higher quality models or not. In …
Scenario-Based And Value-Based Specification Mining: Better Together, David Lo, Shahar Maoz
Scenario-Based And Value-Based Specification Mining: Better Together, David Lo, Shahar Maoz
David LO
Specification mining takes execution traces as input and extracts likely program invariants, which can be used for comprehension, verification, and evolution related tasks. In this work we integrate scenario-based specification mining, which uses a data-mining algorithm to suggest ordering constraints in the form of live sequence charts, an inter-object, visual, modal, scenario-based specification language, with mining of value-based invariants, which detects likely invariants holding at specific program points. The key to the integration is a technique we call scenario-based slicing, running on top of the mining algorithms to distinguish the scenario-specific invariants from the general ones. The resulting suggested specifications …
Observatory Of Trends In Software Related Microblogs, Achananuparp Palakorn, Nelman Lubis Ibrahim, Yuan Tian, David Lo, Ee Peng Lim
Observatory Of Trends In Software Related Microblogs, Achananuparp Palakorn, Nelman Lubis Ibrahim, Yuan Tian, David Lo, Ee Peng Lim
David LO
Microblogging has recently become a popular means to disseminate information among millions of people. Interestingly, software developers also use microblog to communicate with one another. Different from traditional media, microblog users tend to focus on recency and informality of content. Many tweet contents are relatively more personal and Opinionated, compared to that of traditional news report. Thus, by analyzing microblogs, one could get the up-to-date information about what people are interested in or feel toward a particular topic. In this paper, we describe our microblog observatory that aggregates more than 70,000 Twitter feeds, captures software-related tweets, and computes trends from …
Kbe-Anonymity: Test Data Anonymization For Evolving Programs, - Lucia, David Lo, Lingxiao Jiang, Aditya Budi
Kbe-Anonymity: Test Data Anonymization For Evolving Programs, - Lucia, David Lo, Lingxiao Jiang, Aditya Budi
David LO
High-quality test data that is useful for effective testing is often available on users’ site. However, sharing data owned by users with software vendors may raise privacy concerns. Techniques are needed to enable data sharing among data owners and the vendors without leaking data privacy. Evolving programs bring additional challenges because data may be shared multiple times for every version of a program. When multiple versions of the data are cross-referenced, private information could be inferred. Although there are studies addressing the privacy issue of data sharing for testing and debugging, little work has explicitly addressed the challenges when programs …
To What Extent Could We Detect Field Defects? —An Empirical Study Of False Negatives In Static Bug Finding Tools, Ferdian Thung, - Lucia, David Lo, Lingxiao Jiang, Premkumar Devanbu, Foyzur Rahman
To What Extent Could We Detect Field Defects? —An Empirical Study Of False Negatives In Static Bug Finding Tools, Ferdian Thung, - Lucia, David Lo, Lingxiao Jiang, Premkumar Devanbu, Foyzur Rahman
David LO
Software defects can cause much loss. Static bug-finding tools are believed to help detect and remove defects. These tools are designed to find programming errors; but, do they in fact help prevent actual defects that occur in the field and reported by users? If these tools had been used, would they have detected these field defects, and generated warnings that would direct programmers to fix them? To answer these questions, we perform an empirical study that investigates the effectiveness of state-of-the-art static bug finding tools on hundreds of reported and fixed defects extracted from three open source programs: Lucene, Rhino, …
Information Retrieval Based Nearest Neighbor Classification For Fine-Grained Bug Severity Prediction, Yuan Tian, David Lo, Chengnian Sun
Information Retrieval Based Nearest Neighbor Classification For Fine-Grained Bug Severity Prediction, Yuan Tian, David Lo, Chengnian Sun
David LO
Bugs are prevalent in software systems. Some bugs are critical and need to be fixed right away, whereas others are minor and their fixes could be postponed until resources are available. In this work, we propose a new approach leveraging information retrieval, in particular BM25-based document similarity function, to automatically predict the severity of bug reports. Our approach automatically analyzes bug reports reported in the past along with their assigned severity labels, and recommends severity labels to newly reported bug reports. Duplicate bug reports are utilized to determine what bug report features, be it textual, ordinal, or categorical, are important. …
Semantic Patch Inference, Jesper Abdersen, Anh Cuong Nguyen, David Lo, Julia Lawall, Siau-Cheng Khoo
Semantic Patch Inference, Jesper Abdersen, Anh Cuong Nguyen, David Lo, Julia Lawall, Siau-Cheng Khoo
David LO
We propose a tool for inferring transformation specifications from a few examples of original and updated code. These transformation specifications may contain multiple code fragments from within a single function, all of which must be present for the transformation to apply. This makes the inferred transformations context sensitive. Our algorithm is based on depth-first search, with pruning. Because it is applied locally to a collection of functions that contain related changes, it is efficient in practice. We illustrate the approach on an example drawn from recent changes to the Linux kernel.
An Empirical Study Of Bugs In Machine Learning Systems, Ferdian Thung, Shaowei Wang, David Lo, Lingxiao Jiang
An Empirical Study Of Bugs In Machine Learning Systems, Ferdian Thung, Shaowei Wang, David Lo, Lingxiao Jiang
David LO
Many machine learning systems that include various data mining, information retrieval, and natural language processing code and libraries have being used in real world applications. Search engines, internet advertising systems, product recommendation systems are sample users of these algorithm intensive code and libraries. Machine learning code and toolkits have also been used in many recent studies on software mining and analytics that aim to automate various software engineering tasks. With the increasing number of important applications of machine learning systems, the reliability of such systems is also becoming increasingly important. A necessary step for ensuring reliability of such systems is …
Duplicate Bug Report Detection With A Combination Of Information Retrieval And Topic Modeling, Anh Tuan Nguyen, Tung Nguyen, Tien Nguyen, David Lo, Chengnian Sun
Duplicate Bug Report Detection With A Combination Of Information Retrieval And Topic Modeling, Anh Tuan Nguyen, Tung Nguyen, Tien Nguyen, David Lo, Chengnian Sun
David LO
Detecting duplicate bug reports helps reduce triaging efforts and save time for developers in fixing the same issues. Among several automated detection approaches, text-based information retrieval (IR) approaches have been shown to outperform others in term of both accuracy and time efficiency. However, those IR-based approaches do not detect well the duplicate reports on the same technical issues written in different descriptive terms. This paper introduces DBTM, a duplicate bug report detection approach that takes advantage of both IR-based features and topic-based features. DBTM models a bug report as a textual document describing certain technical issue(s), and models duplicate bug …