Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

Publications and Research

Software repository mining

Publication Year
File Type

Articles 1 - 5 of 5

Full-Text Articles in Physical Sciences and Mathematics

Quertci: A Tool Integrating Github Issue Querying With Comment Classification, Ye Paing, Tatiana Castro Vélez, Raffi T. Khatchadourian Jul 2022

Quertci: A Tool Integrating Github Issue Querying With Comment Classification, Ye Paing, Tatiana Castro Vélez, Raffi T. Khatchadourian

Publications and Research

Empirical Software Engineering (ESE) researchers study (open-source) project issues and the comments and threads within to discover—among others—challenges developers face when incorporating new technologies, platforms, and programming language constructs. However, such threads accumulate, becoming unwieldy and hindering any insight researchers may gain. While existing approaches alleviate this burden by classifying issue thread comments, there is a gap between searching popular open-source software repositories (e.g., those on GitHub) for issues containing particular keywords and feeding the results into a classification model. This paper demonstrates a research infrastructure tool called QuerTCI that bridges this gap by integrating the GitHub issue comment search …


A Tool For Rejuvenating Feature Logging Levels Via Git Histories And Degree Of Interest, Yiming Tang, Allan Spektor, Raffi T. Khatchadourian, Mehdi Bagherzadeh May 2022

A Tool For Rejuvenating Feature Logging Levels Via Git Histories And Degree Of Interest, Yiming Tang, Allan Spektor, Raffi T. Khatchadourian, Mehdi Bagherzadeh

Publications and Research

Logging is a significant programming practice. Due to the highly transactional nature of modern software applications, a massive amount of logs are generated every day, which may overwhelm developers. Logging information overload can be dangerous to software applications. Using log levels, developers can print the useful information while hiding the verbose logs during software runtime. As software evolves, the log levels of logging statements associated with the surrounding software feature implementation may also need to be altered. Maintaining log levels necessitates a significant amount of manual effort. In this paper, we demonstrate an automated approach that can rejuvenate feature log …


An Empirical Study Of Refactorings And Technical Debt In Machine Learning Systems, Yiming Tang, Raffi T. Khatchadourian, Mehdi Bagherzadeh, Rhia Singh, Ajani Stewart, Anita Raja May 2021

An Empirical Study Of Refactorings And Technical Debt In Machine Learning Systems, Yiming Tang, Raffi T. Khatchadourian, Mehdi Bagherzadeh, Rhia Singh, Ajani Stewart, Anita Raja

Publications and Research

Machine Learning (ML), including Deep Learning (DL), systems, i.e., those with ML capabilities, are pervasive in today’s data-driven society. Such systems are complex; they are comprised of ML models and many subsystems that support learning processes. As with other complex systems, ML systems are prone to classic technical debt issues, especially when such systems are long-lived, but they also exhibit debt specific to these systems. Unfortunately, there is a gap of knowledge in how ML systems actually evolve and are maintained. In this paper, we fill this gap by studying refactorings, i.e., source-to-source semantics-preserving program transformations, performed in real-world, open-source …


Automated Evolution Of Feature Logging Statement Levels Using Git Histories And Degree Of Interest, Yiming Tang, Allan Spektor, Raffi Khatchadourian, Mehdi Bagherzadeh Apr 2021

Automated Evolution Of Feature Logging Statement Levels Using Git Histories And Degree Of Interest, Yiming Tang, Allan Spektor, Raffi Khatchadourian, Mehdi Bagherzadeh

Publications and Research

Logging—used for system events and security breaches to more informational yet essential aspects of software features—is pervasive. Given the high transactionality of today’s software, logging effectiveness can be reduced by information overload. Log levels help alleviate this problem by correlating a priority to logs that can be later filtered. As software evolves, however, levels of logs documenting surrounding feature implementations may also require modification as features once deemed important may have decreased in urgency and vice-versa. We present an automated approach that assists developers in evolving levels of such (feature) logs. The approach, based on mining Git histories and manipulating …


An Empirical Study Of Refactorings And Technical Debt In Machine Learning Systems, Yiming Tang, Raffi T. Khatchadourian, Mehdi Bagherzadeh, Rhia Singh, Ajani Stewart, Anita Raja Aug 2020

An Empirical Study Of Refactorings And Technical Debt In Machine Learning Systems, Yiming Tang, Raffi T. Khatchadourian, Mehdi Bagherzadeh, Rhia Singh, Ajani Stewart, Anita Raja

Publications and Research

Machine Learning (ML), including Deep Learning (DL), systems, i.e., those with ML capabilities, are pervasive in today's data-driven society. Such systems are complex; they are comprised of ML models and many subsystems that support learning processes. As with other complex systems, ML systems are prone to classic technical debt issues, especially when such systems are long-lived, but they also exhibit debt specific to these systems. Unfortunately, there is a gap of knowledge in how ML systems actually evolve and are maintained. In this paper, we fill this gap by studying refactorings, i.e., source-to-source semantics-preserving program transformations, performed in real-world, open-source …