Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
Articles 1 - 13 of 13
Full-Text Articles in Physical Sciences and Mathematics
Sewordsim: Software-Specific Word Similarity Database, Yuan Tian, David Lo, Julia Lawall
Sewordsim: Software-Specific Word Similarity Database, Yuan Tian, David Lo, Julia Lawall
David LO
Measuring the similarity of words is important in accurately representing and comparing documents, and thus improves the results of many natural language processing (NLP) tasks. The NLP community has proposed various measurements based on WordNet, a lexical database that contains relationships between many pairs of words. Recently, a number of techniques have been proposed to address software engineering issues such as code search and fault localization that require understanding natural language documents, and a measure of word similarity could improve their results. However, WordNet only contains information about words senses in general-purpose conversation, which often differ from word senses in …
Leveraging Machine Learning And Information Retrieval Techniques In Software Evolution Tasks: Summary Of The First Malir-Se Workshop, At Ase 2013, - Lucia, David Lo, Giuseppe Scanniello, Alessandro Marchetto, Nasir Ali, Collin Mcmillan
Leveraging Machine Learning And Information Retrieval Techniques In Software Evolution Tasks: Summary Of The First Malir-Se Workshop, At Ase 2013, - Lucia, David Lo, Giuseppe Scanniello, Alessandro Marchetto, Nasir Ali, Collin Mcmillan
David LO
The first International Workshop on MAchine Learning and Information Retrieval for Software Evolution (MALIR-SE) was held on the 11th of November 2013. The workshop was held in conjunction with the 28th IEEE/ACM International Conference on Automated Software Engineering (ASE) in Silicon Valley, California, USA. The workshop brought researchers and practitioners that were interested in leveraging machine learning and information retrieval techniques to automate various software evolution tasks. During the workshop, papers on the application of machine learning and information retrieval techniques to bug fix time prediction and anti-pattern detection were presented. There were also discussions on the presented papers and …
Hierarchical Parallel Algorithm For Modularity-Based Community Detection Using Gpus, Chun Yew Cheong, Huynh Phung Huynh, David Lo, Rick Siow Mong Goh
Hierarchical Parallel Algorithm For Modularity-Based Community Detection Using Gpus, Chun Yew Cheong, Huynh Phung Huynh, David Lo, Rick Siow Mong Goh
David LO
This paper describes the design of a hierarchical parallel algorithm for accelerating community detection which involves partitioning a network into communities of densely connected nodes. The algorithm is based on the Louvain method developed at the Université Catholique de Louvain, which uses modularity to measure community quality and has been successfully applied on many different types of networks. The proposed hierarchical parallel algorithm targets three levels of parallelism in the Louvain method and it has been implemented on single-GPU and multi-GPU architectures. Benchmarking results on several large web-based networks and popular social networks show that on top of offering speedups …
Software Internationalization And Localization: An Industrial Experience, Xin Xia, David Lo, Feng Zhu, Xinyu Wang, Bo Zhou
Software Internationalization And Localization: An Industrial Experience, Xin Xia, David Lo, Feng Zhu, Xinyu Wang, Bo Zhou
David LO
Software internationalization and localization are important steps in distributing and deploying software to different regions of the world. Internationalization refers to the process of reengineering a system such that it could support various languages and regions without further modification. Localization refers to the process of adapting an internationalized software for a specific language or region. Due to various reasons, many large legacy systems did not consider internationalization and localization at the early stage of development. In this paper, we present our experience on, and propose a process along with tool supports for software internationalization and localization. We reengineer a large …
Leveraging Web 2.0 For Software Evolution, Yuan Tian, David Lo
Leveraging Web 2.0 For Software Evolution, Yuan Tian, David Lo
David LO
In this era of Web 2.0, much information is available on the Internet. Software forums, mailing lists, and question-and-answer sites contain lots of technical information. Blogs contain developers’ opinions, ideas, and descriptions of their day-to-day activities. Microblogs contain recent and popular software news. Software forges contain records of socio-technical interactions of developers. All these resources could potentially be leveraged to help developers in performing software evolution activities. In this chapter, we first present information that is available from these Web 2.0 resources. We then introduce empirical studies that investigate how developers contribute information to and use these resources. Next, we …
An Empirical Study Of Bugs In Build Process, Xiaoqiong Zhao, Xin Xia, Pavneet Singh Kochhar, David Lo, Shanping Li
An Empirical Study Of Bugs In Build Process, Xiaoqiong Zhao, Xin Xia, Pavneet Singh Kochhar, David Lo, Shanping Li
David LO
Software build process translates source codes into executable programs, packages the programs, generates documents, and distributes products. In this paper, we perform an empirical study to characterize build process bugs. We analyze bugs in build process in 5 open-source systems under Apache namely CXF, Camel, Felix, Struts, and Tuscany. We compare build process bugs and other bugs across 3 different dimensions, i.e., bug severity, bug fix time, and the number of files modified to fix a bug. Our results show that the fraction of build process bugs which are above major severity level is lower than that of other bugs. …
Build System Analysis With Link Prediction, Xin Xia, David Lo, Xinyu Wang, Bo Zhou
Build System Analysis With Link Prediction, Xin Xia, David Lo, Xinyu Wang, Bo Zhou
David LO
Compilation is an important step in building working software system. To compile large systems, typically build systems, such as make, are used. In this paper, we investigate a new research problem for build configuration file (e.g., Makefile) analysis: how to predict missed dependencies in a build configuration file. We refer to this problem as dependency mining. Based on a Makefile, we build a dependency graph capturing various relationships defined in the Makefile. By representing a Makefile as a dependency graph, we map the dependency mining problem to a link prediction problem, and leverage 9 state-of-the-art link prediction algorithms to solve …
Collaboration Patterns In Software Developer Network, Didi Surian, David Lo, Ee Peng Lim
Collaboration Patterns In Software Developer Network, Didi Surian, David Lo, Ee Peng Lim
David LO
No abstract provided.
An Empirical Study Of Bug Report Field Reassignment, Xin Xia, David Lo, Ming Wen, Shihab Emad, Bo Zhou
An Empirical Study Of Bug Report Field Reassignment, Xin Xia, David Lo, Ming Wen, Shihab Emad, Bo Zhou
David LO
A bug report contains many fields, such as product, component, severity, priority, fixer, operating system (OS), platform, etc., which provide important information for the bug triaging and fixing process. It is important to make sure that bug information is correct since previous studies showed that the wrong assignment of bug report fields could increase the bug fixing time, and even delay the delivery of the software. In this paper, we perform an empirical study on bug report field reassignments in open-source software projects. To better understand why bug report fields are reassigned, we manually collect 99 recent bug reports that …
Automated Construction Of A Software-Specific Word Similarity Database, Yuan Tian, David Lo, Julia Lawall
Automated Construction Of A Software-Specific Word Similarity Database, Yuan Tian, David Lo, Julia Lawall
David LO
Many automated software engineering approaches, including code search, bug report categorization, and duplicate bug report detection, measure similarities between two documents by analyzing natural language contents. Often different words are used to express the same meaning and thus measuring similarities using exact matching of words is insufficient. To solve this problem, past studies have shown the need to measure the similarities between pairs of words. To meet this need, the natural language processing community has built WordNet which is a manually constructed lexical database that records semantic relations among words and can be used to measure how similar two words …
Towards More Accurate Multi-Label Software Behavior Learning, Xin Xia, Feng Yang, David Lo, Zhenyu Chen, Xinyu Wang
Towards More Accurate Multi-Label Software Behavior Learning, Xin Xia, Feng Yang, David Lo, Zhenyu Chen, Xinyu Wang
David LO
In a modern software system, when a program fails, a crash report which contains an execution trace would be sent to the software vendor for diagnosis. A crash report which corresponds to a failure could be caused by multiple types of faults simultaneously. Many large companies such as Baidu organize a team to analyze these failures, and classify them into multiple labels (i.e., multiple types of faults). However, it would be time-consuming and difficult for developers to manually analyze these failures and come out with appropriate fault labels. In this paper, we automatically classify a failure into multiple types of …
Proceedings Of The 2nd International Workshop On Software Mining, Ming Li, Hongyu Zhang, David Lo
Proceedings Of The 2nd International Workshop On Software Mining, Ming Li, Hongyu Zhang, David Lo
David LO
No abstract provided.
Boat: An Experimental Platform For Researchers To Comparatively And Reproducibly Evaluate Bug Localization Techniques, Xinyu Wang, David Lo, Xin Xia, Xingen Wang, Pavneet Singh Kochhar, Yuan Tian, Xiaohu Yang, Shanping Li, Jianling Sun, Bo Zhou
Boat: An Experimental Platform For Researchers To Comparatively And Reproducibly Evaluate Bug Localization Techniques, Xinyu Wang, David Lo, Xin Xia, Xingen Wang, Pavneet Singh Kochhar, Yuan Tian, Xiaohu Yang, Shanping Li, Jianling Sun, Bo Zhou
David LO
Bug localization refers to the process of identifying source code files that contain defects from descriptions of these defects which are typically contained in bug reports. There have been many bug localization techniques proposed in the literature. However, often it is hard to compare these techniques since different evaluation datasets are used. At times the datasets are not made publicly available and thus it is difficult to reproduce reported results. Furthermore, some techniques are only evaluated on small datasets and thus it is not clear whether the results are generalizable. Thus, there is a need for a platform that allows …