Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 44

Full-Text Articles in Physical Sciences and Mathematics

An Empirical Study Of Bugs In Software Build Systems, Xin Xia, Xiaozhen Zhou, David Lo, Xiaoqiong Zhao Jun 2014

An Empirical Study Of Bugs In Software Build Systems, Xin Xia, Xiaozhen Zhou, David Lo, Xiaoqiong Zhao

David LO

Build system converts source code, libraries and other data into executable programs by orchestrating the execution of compilers and other tools. The whole building process is managed by a software build system, such as Make, Ant, CMake, Maven, Scons, and QMake. The reliability of software build systems would affect the reliability of the build process. In this paper, we perform an empirical study on bugs in software build systems. We analyze four software build systems, Ant, Maven, CMake and QMake, which are four typical and widely-used software build systems, and can be used to build Java, C, C++ systems. We …


Understanding The Genetic Makeup Of Linux Device Drivers, Peter Senna Tschudin, Laurent Reveillere, Lingxiao Jiang, David Lo, Julia Lawall Jun 2014

Understanding The Genetic Makeup Of Linux Device Drivers, Peter Senna Tschudin, Laurent Reveillere, Lingxiao Jiang, David Lo, Julia Lawall

David LO

No abstract provided.


Orion: A Software Project Search Engine With Integrated Diverse Software Artifacts, Tegawende F. Bissyande, Ferdian Thung, David Lo, Lingxiao Jiang, Laurent Réveillère Jun 2014

Orion: A Software Project Search Engine With Integrated Diverse Software Artifacts, Tegawende F. Bissyande, Ferdian Thung, David Lo, Lingxiao Jiang, Laurent Réveillère

David LO

Software projects produce a wealth of data that is leveraged in different tasks and for different purposes: researchers collect project data for building experimental datasets; software programmers reuse code from projects; developers often explore the opportunities for getting involved in the development of a project to gain or offer expertise. Finding relevant projects that suit one needs is however currently challenging with the capabilities of existing search systems. We propose Orion, an integrated search engine architecture that combines information from different types of software repositories from multiple sources to facilitate the construction and execution of advanced search queries. Orion provides …


Predicting Response In Mobile Advertising With Hierarchical Importance-Aware Factorization Machine, Richard Jayadi Oentaryo, Ee Peng Lim, Jia Wei Low, David Lo, Michael Finegold Jun 2014

Predicting Response In Mobile Advertising With Hierarchical Importance-Aware Factorization Machine, Richard Jayadi Oentaryo, Ee Peng Lim, Jia Wei Low, David Lo, Michael Finegold

David LO

Mobile advertising has recently seen dramatic growth, fueled by the global proliferation of mobile phones and devices. The task of predicting ad response is thus crucial for maximizing business revenue. However, ad response data change dynamically over time, and are subject to cold-start situations in which limited history hinders reliable prediction. There is also a need for a robust regression estimation for high prediction accuracy, and good ranking to distinguish the impacts of different ads. To this end, we develop a Hierarchical Importance-aware Factorization Machine (HIFM), which provides an effective generic latent factor framework that incorporates importance weights and hierarchical …


Got Issues? Who Cares About It? A Large Scale Investigation Of Issue Trackers From Github, Tegawende F. Bissyande, David Lo, Lingxiao Jiang, Laurent Reveillere, Jacques Klein, Yves Le Traon Jun 2014

Got Issues? Who Cares About It? A Large Scale Investigation Of Issue Trackers From Github, Tegawende F. Bissyande, David Lo, Lingxiao Jiang, Laurent Reveillere, Jacques Klein, Yves Le Traon

David LO

Feedback from software users constitutes a vital part in the evolution of software projects. By filing issue reports, users help identify and fix bugs, document software code, and enhance the software via feature requests. Many studies have explored issue reports, proposed approaches to enable the submission of higher-quality reports, and presented techniques to sort, categorize and leverage issues for software engineering needs. Who, however, cares about filing issues? What kind of issues are reported in issue trackers? What kind of correlation exist between issue reporting and the success of software projects? In this study, we address the need for answering …


Clustering Of Search Trajectory And Its Application To Parameter Tuning, Linda Lindawati, Hoong Chuin Lau, David Lo Jun 2014

Clustering Of Search Trajectory And Its Application To Parameter Tuning, Linda Lindawati, Hoong Chuin Lau, David Lo

David LO

This paper is concerned with automated classification of Combinatorial Optimization Problem instances for instance-specific parameter tuning purpose. We propose the CluPaTra Framework, a generic approach to CLUster instances based on similar PAtterns according to search TRAjectories and apply it on parameter tuning. The key idea is to use the search trajectory as a generic feature for clustering problem instances. The advantage of using search trajectory is that it can be obtained from any local-search based algorithm with small additional computation time. We explore and compare two different search trajectory representations, two sequence alignment techniques (to calculate similarities) as well as …


Automatic Recommendation Of Api Methods From Feature Requests, Ferdian Thung, Shaowei Wang, David Lo, Julia Lawall Jun 2014

Automatic Recommendation Of Api Methods From Feature Requests, Ferdian Thung, Shaowei Wang, David Lo, Julia Lawall

David LO

Developers often receive many feature requests. To implement these features, developers can leverage various methods from third party libraries. In this work, we propose an automated approach that takes as input a textual description of a feature request. It then recommends methods in library APIs that developers can use to implement the feature. Our recommendation approach learns from records of other changes made to software systems, and compares the textual description of the requested feature with the textual descriptions of various API methods. We have evaluated our approach on more than 500 feature requests of Axis2/Java, CXF, Hadoop Common, HBase, …


Mining Branching-Time Scenarios, Dirk Fahland, David Lo, Shahar Maoz Jun 2014

Mining Branching-Time Scenarios, Dirk Fahland, David Lo, Shahar Maoz

David LO

Specification mining extracts candidate specification from existing systems, to be used for downstream tasks such as testing and verification. Specifically, we are interested in the extraction of behavior models from execution traces. In this paper we introduce mining of branching-time scenarios in the form of existential, conditional Live Sequence Charts, using a statistical data-mining algorithm. We show the power of branching scenarios to reveal alternative scenario-based behaviors, which could not be mined by previous approaches. The work contrasts and complements previous works on mining linear-time scenarios. An implementation and evaluation over execution trace sets recorded from several real-world applications shows …


Popularity, Interoperability, And Impact Of Programming Languages In 100,000 Open Source Projects, Tegawende F. Bissyande, Ferdian Thung, David Lo, Lingxiao Jiang, Laurent Réveillère Jun 2014

Popularity, Interoperability, And Impact Of Programming Languages In 100,000 Open Source Projects, Tegawende F. Bissyande, Ferdian Thung, David Lo, Lingxiao Jiang, Laurent Réveillère

David LO

Programming languages have been proposed even before the era of the modern computer. As years have gone, computer resources have increased and application domains have expanded, leading to the proliferation of hundreds of programming languages, each attempting to improve over others or to address new programming paradigms. These languages range from procedural languages like C, object oriented languages like Java, and functional languages such as ML and Haskell. Unfortunately, there is a lack of large scale and comprehensive studies that examine the “popularity”, “interoperability”, and “impact” of various programming languages. To fill this gap, this study investigates a hundred thousands …


An Empirical Study Of Adoption Of Software Testing In Open Source Projects, Pavneet Singh Kochhar, Tegawende F. Bissyande, David Lo, Lingxiao Jiang Jun 2014

An Empirical Study Of Adoption Of Software Testing In Open Source Projects, Pavneet Singh Kochhar, Tegawende F. Bissyande, David Lo, Lingxiao Jiang

David LO

In software engineering, testing is a crucial activity that is designed to ensure the quality of program code. For this activity, software teams spend substantial resources constructing test cases to thoroughly assess the correctness of software functionality. What is the proportion of open source projects that include test cases? What is the effect of number of developers on the number of test cases? In this study, we explore open source projects and investigate the correlation between the presence of test cases and various project development characteristics, including the number of lines of code, the size of development teams and the …


Automatic Recovery Of Root Causes From Bug-Fixing Changes, Ferdian Thung, David Lo, Lingxiao Jiang Jun 2014

Automatic Recovery Of Root Causes From Bug-Fixing Changes, Ferdian Thung, David Lo, Lingxiao Jiang

David LO

What is the root cause of this failure? This question is often among the first few asked by software debuggers when they try to address issues raised by a bug report. Root cause is the erroneous lines of code that cause a chain of erroneous program states eventually leading to the failure. Bug tracking and source control systems only record the symptoms (e.g., bug reports) and treatments of a bug (e.g., committed changes that fix the bug), but not its root cause. Many treatments contain non-essential changes, which are intermingled with root causes. Reverse engineering the root cause of a …


Tag Recommendation In Software Information Sites, Xin Xia, David Lo, Xinyu Wang, Bo Zhou Jun 2014

Tag Recommendation In Software Information Sites, Xin Xia, David Lo, Xinyu Wang, Bo Zhou

David LO

Nowadays, software engineers use a variety of online media to search and become informed of new and interesting technologies, and to learn from and help one another. We refer to these kinds of online media which help software engineers improve their performance in software development, maintenance and test processes as software information sites. It is common to see tags in software information sites and many sites allow users to tag various objects with their own words. Users increasingly use tags to describe the most important features of their posted contents or projects. In this paper, we propose TagCombine, an automatic …


R-Energy For Evaluating Robustness Of Dynamic Networks, Ming Gao, Ee Peng Lim, David Lo Jun 2014

R-Energy For Evaluating Robustness Of Dynamic Networks, Ming Gao, Ee Peng Lim, David Lo

David LO

The robustness of a network is determined by how well its vertices are connected to one another so as to keep the network strong and sustainable. As the network evolves its robustness changes and may reveal events as well as periodic trend patterns that affect the interactions among users in the network. In this paper, we develop R-energy as a new measure of network robustness based on the spectral analysis of normalized Laplacian matrix. R-energy can cope with disconnected networks, and is efficient to compute with a time complexity of O (jV j + jEj) where V and E are …


Accurate Developer Recommendation For Bug Resolution, Xin Xia, David Lo, Xinyu Wang, Bo Zhou Jun 2014

Accurate Developer Recommendation For Bug Resolution, Xin Xia, David Lo, Xinyu Wang, Bo Zhou

David LO

Bug resolution refers to the activity that developers perform to diagnose, fix, test, and document bugs during software development and maintenance. It is a collaborative activity among developers who contribute their knowledge, ideas, and expertise to resolve bugs. Given a bug report, we would like to recommend the set of bug resolvers that could potentially contribute their knowledge to fix it. We refer to this problem as developer recommendation for bug resolution. In this paper, we propose a new and accurate method named DevRec for the developer recommendation problem. DevRec is a composite method which performs two kinds of analysis: …


An Automated Approach For Finding Variable-Constant Pairing Bugs, Julia Lawall, David Lo Jun 2014

An Automated Approach For Finding Variable-Constant Pairing Bugs, Julia Lawall, David Lo

David LO

Named constants are used heavily in operating systems code, both as internal ags and in interactions with devices. Decision making within an operating system thus critically depends on the correct usage of these values. Nevertheless, compilers for the languages typically used in implementing operating systems provide little support for checking the usage of named constants. This affects correctness, when a constant is used in a context where its value is meaningless, and software maintenance, when a constant has the right value for its usage context but the wrong name. We propose a hybrid program-analysis and data-mining based approach to identify …


Extended Comprehensive Study Of Association Measures For Fault Localization, Lucia Lucia, David Lo, Lingxiao Jiang, Ferdian Thung, Aditya Budi Jun 2014

Extended Comprehensive Study Of Association Measures For Fault Localization, Lucia Lucia, David Lo, Lingxiao Jiang, Ferdian Thung, Aditya Budi

David LO

Spectrum-based fault localization is a promising approach to automatically locate root causes of failures quickly. Two well-known spectrum-based fault localization techniques, Tarantula and Ochiai, measure how likely a program element is a root cause of failures based on profiles of correct and failed program executions. These techniques are conceptually similar to association measures that have been proposed in statistics, data mining, and have been utilized to quantify the relationship strength between two variables of interest (e.g., the use of a medicine and the cure rate of a disease). In this paper, we view fault localization as a measurement of the …


Automated Library Recommendation, Ferdian Thung, David Lo, Julia Lawall Jun 2014

Automated Library Recommendation, Ferdian Thung, David Lo, Julia Lawall

David LO

Many third party libraries are available to be downloaded and used. Using such libraries can reduce development time and make the developed software more reliable. However, developers are often unaware of suitable libraries to be used for their projects and thus they miss out on these benefits. To help developers better take advantage of the available libraries, we propose a new technique that automatically recommends libraries to developers. Our technique takes as input the set of libraries that an application currently uses, and recommends other libraries that are likely to be relevant. We follow a hybrid approach that combines association …


Understanding Widespread Changes: A Taxonomic Study, Shaowei Wang, David Lo, Lingxiao Jiang Apr 2013

Understanding Widespread Changes: A Taxonomic Study, Shaowei Wang, David Lo, Lingxiao Jiang

David LO

Many active research studies in software engineering, such as detection of recurring bug fixes, detection of copyand- paste bugs, and automated program transformation tools, are motivated by the assumption that many code changes (e.g., changing an identifier name) in software systems are widespread to many locations and are similar to one another. However, there is no study so far that actually analyzes widespread changes in software systems. Understanding the nature of widespread changes could empirically support the assumption, which provides insight to improve the research studies and related tools. Our study in this paper addresses such a need. We propose …


Diffusion Of Software Features: An Exploratory Study, Ferdian Thung, David Lo, Lingxiao Jiang Apr 2013

Diffusion Of Software Features: An Exploratory Study, Ferdian Thung, David Lo, Lingxiao Jiang

David LO

New features are frequently proposed in many software libraries. These features include new methods, classes, packages, etc. These features are utilized in many open source and commercial software systems. Some of these features are adopted very quickly, while others take a long time to be adopted. Each feature takes much resource to develop, test, and document. Library developers and managers need to decide what feature to prioritize and what to develop next. As a first step to aid these stakeholders, we perform an exploratory study on the diffusion or rate of adoption of features in Java Development Kit (JDK) library. …


Predicting Project Outcome Leveraging Socio-Technical Network Patterns, Didi Surian, Yuan Tian, David Lo, Hong Cheng, Ee Peng Lim Apr 2013

Predicting Project Outcome Leveraging Socio-Technical Network Patterns, Didi Surian, Yuan Tian, David Lo, Hong Cheng, Ee Peng Lim

David LO

There are many software projects started daily, some are successful, while others are not. Successful projects get completed, are used by many people, and bring benefits to users. Failed projects do not bring similar benefits. In this work, we are interested in developing an effective machine learning solution that predicts project outcome (i.e., success or failures) from developer socio-technical network. To do so, we investigate successful and failed projects to find factors that differentiate the two. We analyze the socio-technical aspect of the software development process by focusing at the people that contribute to these projects and the interactions among …


Network Structure Of Social Coding In Github, Ferdian Thung, Tegawende F. Bissyande, David Lo, Lingxiao Jiang Apr 2013

Network Structure Of Social Coding In Github, Ferdian Thung, Tegawende F. Bissyande, David Lo, Lingxiao Jiang

David LO

Social coding enables a different experience of software development as the activities and interests of one developer are easily advertized to other developers. Developers can thus track the activities relevant to various projects in one umbrella site. Such a major change in collaborative software development makes an investigation of networkings on social coding sites valuable. Furthermore, project hosting platforms promoting this development paradigm have been thriving, among which GitHub has arguably gained the most momentum. In this paper, we contribute to the body of knowledge on social coding by investigating the network structure of social coding in GitHub. We collect …


Adoption Of Software Testing In Open Source Projects: A Preliminary Study On 50,000 Projects, Pavneet Singh Kochhar, Tegawende F. Bissyande, David Lo, Lingxiao Jiang Apr 2013

Adoption Of Software Testing In Open Source Projects: A Preliminary Study On 50,000 Projects, Pavneet Singh Kochhar, Tegawende F. Bissyande, David Lo, Lingxiao Jiang

David LO

In software engineering, testing is a crucial activity that is designed to ensure the quality of program code. For this activity, development teams spend substantial resources constructing test cases to thoroughly assess the correctness of software functionality. What is however the proportion of open source projects that include test cases? What kind of projects are more likely to include test cases? In this study, we explore 50,000 projects and investigate the correlation between the presence of test cases and various project development characteristics, including the lines of code and the size of development teams.


When Would This Bug Get Reported?, Ferdian Thung, David Lo, Lingxiao Jiang, Lucia Lucia, Foyzur Rahman, Premkumar Devanbu Dec 2012

When Would This Bug Get Reported?, Ferdian Thung, David Lo, Lingxiao Jiang, Lucia Lucia, Foyzur Rahman, Premkumar Devanbu

David LO

Millions of people, including those in the software engineering communities have turned to microblogging services, such as Twitter, as a means to quickly disseminate information. A number of past studies by Treude et al., Storey, and Yuan et al. have shown that a wealth of interesting information is stored in these microblogs. However, microblogs also contain a large amount of noisy content that are less relevant to software developers in engineering software systems. In this work, we perform a preliminary study to investigate the feasibility of automatic classification of microblogs into two categories: relevant and irrelevant to engineering software systems. …


An Empirical Study Of Bugs In Machine Learning Systems, Ferdian Thung, Shaowei Wang, David Lo, Lingxiao Jiang Dec 2012

An Empirical Study Of Bugs In Machine Learning Systems, Ferdian Thung, Shaowei Wang, David Lo, Lingxiao Jiang

David LO

Many machine learning systems that include various data mining, information retrieval, and natural language processing code and libraries have being used in real world applications. Search engines, internet advertising systems, product recommendation systems are sample users of these algorithm intensive code and libraries. Machine learning code and toolkits have also been used in many recent studies on software mining and analytics that aim to automate various software engineering tasks. With the increasing number of important applications of machine learning systems, the reliability of such systems is also becoming increasingly important. A necessary step for ensuring reliability of such systems is …


What Does Software Engineering Community Microblog About?, Yuan Tian, Palakorn Achananuparp, Ibrahim Nelman Lubis, David Lo, Ee Peng Lim Aug 2012

What Does Software Engineering Community Microblog About?, Yuan Tian, Palakorn Achananuparp, Ibrahim Nelman Lubis, David Lo, Ee Peng Lim

David LO

Microblogging is a new trend to communicate and to disseminate information. One microblog post could potentially reach millions of users. Millions of microblogs are generated on a daily basis on popular sites such as Twitter. The popularity of microblogging among programmers, software engineers, and software users has also led to their use of microblogs to communicate software engineering issues apart from using emails and other traditional communication channels.Understanding how millions of users use microblogs in software engineering related activities would shed light on ways we could leverage the fast evolving microblogging content to aid software development efforts. In this work, …


Human: Creating Memorable Fingerprints Of Mobile Users, Gupta Payas, Kiat Wee Tan, Narayanasamy Ramasubbu, David Lo, Debin Gao, Rajesh Krishna Balan Aug 2012

Human: Creating Memorable Fingerprints Of Mobile Users, Gupta Payas, Kiat Wee Tan, Narayanasamy Ramasubbu, David Lo, Debin Gao, Rajesh Krishna Balan

David LO

In this paper, we present a new way of generating behavioral (not biometric) fingerprints from the cellphone usage data. In particular, we explore if the generated behavioral fingerprints are memorable enough to be remembered by end users. We built a system, called HuMan, that generates fingerprints from cellphone data. To test HuMan, we conducted an extensive user study that involved collecting about one month of continuous usage data (including calls, SMSes, application usage patterns etc.) from 44 Symbian and Android smartphone users. We evaluated the memorable fingerprints generated from this rich multi-context data by asking each user to answer various …


Active Refinement Of Clone Anomaly Reports, Lucia, David Lo, Lingxiao Jiang, Aditya Budi Aug 2012

Active Refinement Of Clone Anomaly Reports, Lucia, David Lo, Lingxiao Jiang, Aditya Budi

David LO

Software clones have been widely studied in the recent literature and shown useful for finding bugs because inconsistent changes among clones in a clone group may indicate potential bugs. However, many inconsistent clone groups are not real bugs (true positives). The excessive number of false positives could easily impede broad adoption of clone-based bug detection approaches. In this work, we aim to improve the usability of clone-based bug detection tools by increasing the rate of true positives found when a developer analyzes anomaly reports. Our idea is to control the number of anomaly reports a user can see at a …


Kb-Anonymity: A Model For Anonymized Behavior-Preserving Test And Debugging Data, Aditya Budi, David Lo, Lingxiao Jiang, Lucia Lucia Dec 2011

Kb-Anonymity: A Model For Anonymized Behavior-Preserving Test And Debugging Data, Aditya Budi, David Lo, Lingxiao Jiang, Lucia Lucia

David LO

It is often very expensive and practically infeasible to generate test cases that can exercise all possible program states in a program. This is especially true for a medium or large industrial system. In practice, industrial clients of the system often have a set of input data collected either before the system is built or after the deployment of a previous version of the system. Such data are highly valuable as they represent the operations that matter in a client's daily business and may be used to extensively test the system. However, such data often carries sensitive information and cannot …


Code Search Via Topic-Enriched Dependence Graph Matching, Shaowei Wang, David Lo, Lingxiao Jiang Dec 2011

Code Search Via Topic-Enriched Dependence Graph Matching, Shaowei Wang, David Lo, Lingxiao Jiang

David LO

Source code contains textual, structural, and semantic information, which can all be leveraged for effective search. Some studies have proposed semantic code search where users can specify query topics in a natural language. Other studies can search through system dependence graphs. In this paper, we propose a semantic dependence search engine that integrates both kinds of techniques and can retrieve code snippets based on expressive user queries describing both topics and dependencies. Users can specify their search targets in a free form format describing desired topics (i.e., high-level semantic or functionality of the target code); a specialized graph query language …


Finding Relevant Answers In Software Forums, Swapna Gottopati, David Lo, Jing Jiang Dec 2011

Finding Relevant Answers In Software Forums, Swapna Gottopati, David Lo, Jing Jiang

David LO

Online software forums provide a huge amount of valuable content. Developers and users often ask questions and receive answers from such forums. The availability of a vast amount of thread discussions in forums provides ample opportunities for knowledge acquisition and summarization. For a given search query, current search engines use traditional information retrieval approach to extract webpages containing relevant keywords. However, in software forums, often there are many threads containing similar keywords where each thread could contain a lot of posts as many as 1,000 or more. Manually finding relevant answers from these long threads is a painstaking task to …