Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 19 of 19

Full-Text Articles in Physical Sciences and Mathematics

Automatic Pull Request Title Generation, Ting Zhang, Ivana Clairine Irsan, Ferdian Thung, Donggyun Han, David Lo, Lingxiao Jiang Oct 2022

Automatic Pull Request Title Generation, Ting Zhang, Ivana Clairine Irsan, Ferdian Thung, Donggyun Han, David Lo, Lingxiao Jiang

Research Collection School Of Computing and Information Systems

Pull Requests (PRs) are a mechanism on modern collaborative coding platforms, such as GitHub. PRs allow developers to tell others that their code changes are available for merging into another branch in a repository. A PR needs to be reviewed and approved by the core team of the repository before the changes are merged into the branch. Usually, reviewers need to identify a PR that is in line with their interests before providing a review. By default, PRs are arranged in a list view that shows the titles of PRs. Therefore, it is desirable to have a precise and concise …


Autoprtitle: A Tool For Automatic Pull Request Title Generation, Ivana Clairine Irsan, Ting Zhang, Ferdian Thung, David Lo, Lingxiao Jiang Oct 2022

Autoprtitle: A Tool For Automatic Pull Request Title Generation, Ivana Clairine Irsan, Ting Zhang, Ferdian Thung, David Lo, Lingxiao Jiang

Research Collection School Of Computing and Information Systems

With the rise of the pull request mechanism in software development, the quality of pull requests has gained more attention. Prior works focus on improving the quality of pull request descriptions and several approaches have been proposed to automatically generate pull request descriptions. As an essential component of a pull request, pull request titles have not received a similar level of attention. To further facilitate automation in software development and to help developers draft high-quality pull request titles, we introduce AutoPRTitle. AutoPRTitle is specifically designed to generate pull request titles automatically. AutoPRTitle can generate a precise and succinct pull request …


Including Everyone, Everywhere: Understanding Opportunities And Challenges Of Geographic Gender-Inclusion In Oss, Gede Artha Azriadi Prana, Denae Ford, Ayushi Rastogi, David Lo, Rahul Purandare, Nachiappan Nagappan Feb 2022

Including Everyone, Everywhere: Understanding Opportunities And Challenges Of Geographic Gender-Inclusion In Oss, Gede Artha Azriadi Prana, Denae Ford, Ayushi Rastogi, David Lo, Rahul Purandare, Nachiappan Nagappan

Research Collection School Of Computing and Information Systems

The gender gap is a significant concern facing the software industry as the development becomes more geographically distributed. Widely shared reports indicate that gender differences may be specific to each region. However, how complete can these reports be with little to no research reflective of the Open Source Software (OSS) process and communities software is now commonly developed in? Our study presents a multi-region geographical analysis of gender inclusion on GitHub. This mixed-methods approach includes quantitatively investigating differences in gender inclusion in projects across geographic regions and investigate these trends over time using data from contributions to 21,456 project repositories. …


Can We Make It Better? Assessing And Improving Quality Of Github Repositories, Gede Artha Azriadi Prana Nov 2021

Can We Make It Better? Assessing And Improving Quality Of Github Repositories, Gede Artha Azriadi Prana

Dissertations and Theses Collection (Open Access)

The code hosting platform GitHub has gained immense popularity worldwide in recent years, with over 200 million repositories hosted as of June 2021. Due to its popularity, it has great potential to facilitate widespread improvements across many software projects. Naturally, GitHub has attracted much research attention, and the source code in the various repositories it hosts also provide opportunity to apply techniques and tools developed by software engineering researchers over the years. However, much of existing body of research applicable to GitHub focuses on code quality of the software projects and ways to improve them. Fewer work focus on potential …


A Large Scale Study Of Long-Time Contributor Prediction For Github Projects, Lingfeng Bao, Xin Xia, David Lo, Gail C. Murphy Jun 2021

A Large Scale Study Of Long-Time Contributor Prediction For Github Projects, Lingfeng Bao, Xin Xia, David Lo, Gail C. Murphy

Research Collection School Of Computing and Information Systems

The continuous contributions made by long time contributors (LTCs) are a key factor enabling open source software (OSS) projects to be successful and survival. We study Github as it has a large number of OSS projects and millions of contributors, which enables the study of the transition from newcomers to LTCs. In this paper, we investigate whether we can effectively predict newcomers in OSS projects to be LTCs based on their activity data that is collected from Github. We collect Github data from GHTorrent, a mirror of Github data. We select the most popular 917 projects, which contain 75,046 contributors. …


Keen2act: Activity Recommendation In Online Social Collaborative Platforms, Roy Ka-Wei Lee, Thong Hoang, Richard J. Oentaryo, David Lo Jul 2020

Keen2act: Activity Recommendation In Online Social Collaborative Platforms, Roy Ka-Wei Lee, Thong Hoang, Richard J. Oentaryo, David Lo

Research Collection School Of Computing and Information Systems

Social collaborative platforms such as GitHub and Stack Overflow have been increasingly used to improve work productivity via collaborative efforts. To improve user experiences in these platforms, it is desirable to have a recommender system that can suggest not only items (e.g., a GitHub repository) to a user, but also activities to be performed on the suggested items (e.g., forking a repository). To this end, we propose a new approach dubbed Keen2Act, which decomposes the recommendation problem into two stages: the Keen and Act steps. The Keen step identifies, for a given user, a (sub)set of items in which he/she …


Sentiment Analysis Over Collaborative Relationships In Open Source Software Projects, Lingjia Li, Jian Cao, David Lo Jul 2020

Sentiment Analysis Over Collaborative Relationships In Open Source Software Projects, Lingjia Li, Jian Cao, David Lo

Research Collection School Of Computing and Information Systems

Sentiments and collaboration efficiency are key factors in the success of the open source software (OSS) development process. However, in the software engineering domain, no studies have been conducted to analyze the effect between collaborators' sentiments, and the role of sentiment in collaborative relationships during the development process. In this study, we apply sentiment analysis and statistical analysis on collaboration artifacts over five projects on GitHub. We use sentiment consistency to quantify the relation between sentiments in collaborative relationships. It is found that sentiment consistency is positively correlated with the closeness of collaborative relationships and collaborators' overall sentiment states. We …


Ausearch: Accurate Api Usage Search In Github Repositories With Type Resolution, Muhammad Hilmi Asyrofi, Ferdian Thung, David Lo, Lingxiao Jiang Feb 2020

Ausearch: Accurate Api Usage Search In Github Repositories With Type Resolution, Muhammad Hilmi Asyrofi, Ferdian Thung, David Lo, Lingxiao Jiang

Research Collection School Of Computing and Information Systems

Nowadays, developers use APIs to implement their applications. To know how to use an API, developers may search for code examples that use the API in repositories such as GitHub. Although code search engines have been developed to help developers perform such search, these engines typically only accept a query containing the description of the task that needs to be implemented or the names of the APIs that the developer wants to use without the capability for the developer to specify particular search constraints, such as the class and parameter types that the relevant API should take. These engines are …


Who Should Make Decision On This Pull Request? Analyzing Time-Decaying Relationships And File Similarities For Integrator Prediction, Jing Jiang, David Lo, Jiateng Zheng, Xin Xia, Yun Yang, Li Zhang Aug 2019

Who Should Make Decision On This Pull Request? Analyzing Time-Decaying Relationships And File Similarities For Integrator Prediction, Jing Jiang, David Lo, Jiateng Zheng, Xin Xia, Yun Yang, Li Zhang

Research Collection School Of Computing and Information Systems

In pull-based development model, integrators are responsible for making decisions about whether to accept pull requests andintegrate code contributions. Ideally, pull requests are assigned to integrators and evaluated within a short time after their submissions. However, the volume of incoming pull requests is large in popular projects, and integrators often encounter difficulties inprocessing pull requests in a timely fashion. Therefore, an automatic integrator prediction approach is required to assign appropriate pull requests to integrators. In this paper, we propose an approach TRFPre which analyzes Time-decaying Relationships andFile similarities to predict integrators. We evaluate the effectiveness of TRFPre on 24 projects …


A First Look At Unfollowing Behavior On Github, Jing Jiang, David Lo, Yun Yang, Jianfeng Li, Li Zhang Jan 2019

A First Look At Unfollowing Behavior On Github, Jing Jiang, David Lo, Yun Yang, Jianfeng Li, Li Zhang

Research Collection School Of Computing and Information Systems

Many open source software projects rely on contributors to fix bugs and contribute new features. On GitHub, developers often broadcast their activities to followers, which may entice followers to be project contributors. It is important to understand unfollowing behavior, maintain current followers, and attract some followers to become contributors in OSS projects.Our objective in this paper is to provide a comprehensive analysis of unfollowing behavior on GitHub. To the best of our knowledge, we present a first look at unfollowing behavior on GitHub. We collect a dataset containing 701,364 developers and their 4,602,440 following relationships in March 2016. We also …


Augmenting And Structuring User Queries To Support Efficient Free-Form Code Search, Raphael Sirres, Tegawendé F. Bissyande, Dongsun Kim, David Lo, Jacques Klein, Kisub Kim, Yves Le Traon Oct 2018

Augmenting And Structuring User Queries To Support Efficient Free-Form Code Search, Raphael Sirres, Tegawendé F. Bissyande, Dongsun Kim, David Lo, Jacques Klein, Kisub Kim, Yves Le Traon

Research Collection School Of Computing and Information Systems

Source code terms such as method names and variable types are often different from conceptual words mentioned in a search query. This vocabulary mismatch problem can make code search inefficient. In this paper, we present COde voCABUlary (CoCaBu), an approach to resolving the vocabulary mismatch problem when dealing with free-form code search queries. Our approach leverages common developer questions and the associated expert answers to augment user queries with the relevant, but missing, structural code entities in order to improve the performance of matching relevant code examples within large code repositories. To instantiate this approach, we build GitSearch, a code …


Unusual Events In Github Repositories, Christoph Treude, Larissa Leite, Maurício Aniche Aug 2018

Unusual Events In Github Repositories, Christoph Treude, Larissa Leite, Maurício Aniche

Research Collection School Of Computing and Information Systems

In large and active software projects, it becomes impractical for a developer to stay aware of all project activity. While it might not be necessary to know about each commit or issue, it is arguably important to know about the ones that are unusual. To investigate this hypothesis, we identified unusual events in 200 GitHub projects using a comprehensive list of ways in which an artifact can be unusual and asked 140 developers responsible for or affected by these events to comment on the usefulness of the corresponding information. Based on 2,096 answers, we identify the subset of unusual events …


Where Does Google Find Api Documentation?, Christoph Treude, Maurício Aniche Jun 2018

Where Does Google Find Api Documentation?, Christoph Treude, Maurício Aniche

Research Collection School Of Computing and Information Systems

The documentation of popular APIs is spread across many formats, from vendor-curated reference documentation to Stack Overflow threads. For developers, it is often not obvious from where a particular piece of information can be retrieved. To understand this documentation landscape, we systematically conducted Google searches for the elements of ten popular APIs. We found that their documentation is widely dispersed among many sources, that GitHub and Stack Overflow play a prominent role among the search results, and that most sources are quick to document new API functionalities. These findings inform API vendors about where developers find documentation about their products, …


Understanding Inactive Yet Available Assignees In Github, Jing Jiang, David Lo, Xinyu Ma, Fuli Feng, Li Zhang Nov 2017

Understanding Inactive Yet Available Assignees In Github, Jing Jiang, David Lo, Xinyu Ma, Fuli Feng, Li Zhang

Research Collection School Of Computing and Information Systems

Context In GitHub, an issue or a pull request can be assigned to a specific assignee who is responsible for working on this issue or pull request. Due to the principle of voluntary participation, available assignees may remain inactive in projects. If assignees ever participate in projects, they are active assignees; otherwise, they are inactive yet available assignees (inactive assignees for short). Objective Our objective in this paper is to provide a comprehensive analysis of inactive yet available assignees in GitHub. Method We collect 2,374,474 records of activities in 37 popular projects, and 797,756 records of activities in 687 projects …


Cataloging Github Repositories, Abhishek Sharma, Ferdian Thung, Pavneet Singh Kochhar, Agus Sulistya, David Lo Jun 2017

Cataloging Github Repositories, Abhishek Sharma, Ferdian Thung, Pavneet Singh Kochhar, Agus Sulistya, David Lo

Research Collection School Of Computing and Information Systems

GitHub is one of the largest and most popular repository hosting service today, having about 14 million users and more than 54 million repositories as of March 2017. This makes it an excellent platform to find projects that developers are interested in exploring. GitHub showcases its most popular projects by cataloging them manually into categories such as DevOps tools, web application frameworks, and game engines. We propose that such cataloging should not be limited only to popular projects. We explore the possibility of developing such cataloging system by automatically extracting functionality descriptive text segments from readme files of GitHub repositories. …


Revisiting Assert Use In Github Projects, Pavneet Singh Kochhar, David Lo Jun 2017

Revisiting Assert Use In Github Projects, Pavneet Singh Kochhar, David Lo

Research Collection School Of Computing and Information Systems

Assertions are often used to test the assumptions that developers have about a program. An assertion contains a boolean expression which developers believe to be true at a particular program point. It throws an error if the expression is not satisfied, which helps developers to detect and correct bugs. Since assertions make developer assumptions explicit, assertions are also believed to improve under-standability of code. Recently, Casalnuovo et al. analyse C and C++ programs to understand the relationship between assertion usage and defect occurrence. Their results show that asserts have a small effect on reducing the density of bugs and developers …


Why And How Developers Fork What From Whom In Github, Jing Jiang, David Lo, Jiahuan He, Xin Xia, Pavneet Singh Kochhar, Li Zhang Feb 2017

Why And How Developers Fork What From Whom In Github, Jing Jiang, David Lo, Jiahuan He, Xin Xia, Pavneet Singh Kochhar, Li Zhang

Research Collection School Of Computing and Information Systems

Forking is the creation of a new software repository by copying another repository. Though forking is controversial in traditional open source software (OSS) community, it is encouraged and is a built-in feature in GitHub. Developers freely fork repositories, use codes as their own and make changes. A deep understanding of repository forking can provide important insights for OSS community and GitHub. In this paper, we explore why and how developers fork what from whom in GitHub. We collect a dataset containing 236,344 developers and 1,841,324 forks. We make surveys, and analyze programming languages and owners of forked repositories. Our main …


Detecting Similar Repositories On Github, Yun Zhang, David Lo, Pavneet Singh Kochhar, Xin Xia, Quanlai Li, Jianling Sun Feb 2017

Detecting Similar Repositories On Github, Yun Zhang, David Lo, Pavneet Singh Kochhar, Xin Xia, Quanlai Li, Jianling Sun

Research Collection School Of Computing and Information Systems

GitHub contains millions of repositories among which many are similar with one another (i.e., having similar source codes or implementing similar functionalities). Finding similar repositories on GitHub can be helpful for software engineers as it can help them reuse source code, build prototypes, identify alternative implementations, explore related projects, find projects to contribute to, and discover code theft and plagiarism. Previous studies have proposed techniques to detect similar applications by analyzing API usage patterns and software tags. However, these prior studies either only make use of a limited source of information or use information not available for projects on GitHub. …


Popularity, Interoperability, And Impact Of Programming Languages In 100,000 Open Source Projects, Tegawende F. Bissyande, Ferdian Thung, David Lo, Lingxiao Jiang, Laurent Réveillère Jul 2013

Popularity, Interoperability, And Impact Of Programming Languages In 100,000 Open Source Projects, Tegawende F. Bissyande, Ferdian Thung, David Lo, Lingxiao Jiang, Laurent Réveillère

Research Collection School Of Computing and Information Systems

Programming languages have been proposed even before the era of the modern computer. As years have gone, computer resources have increased and application domains have expanded, leading to the proliferation of hundreds of programming languages, each attempting to improve over others or to address new programming paradigms. These languages range from procedural languages like C, object oriented languages like Java, and functional languages such as ML and Haskell. Unfortunately, there is a lack of large scale and comprehensive studies that examine the “popularity”, “interoperability”, and “impact” of various programming languages. To fill this gap, this study investigates a hundred thousands …