Open Access. Powered by Scholars. Published by Universities.®

Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

UNF Graduate Theses and Dissertations

Theses/Dissertations

2019

Academic -- UNF -- Computing; code clone; clone detection; topic modeling; machine learning; software refactoring; software engineering; Latent Dirichlet Allocation -- Testing; Topic models -- Testing; Generative statistical models -- Testing; Code clone detection -- Statistical models; Code clone detection -- Software; CloneTM -- Testing

Articles 1 - 1 of 1

Full-Text Articles in Engineering

A Topic Modeling Approach For Code Clone Detection, Mohammed Salman Khan Jan 2019

A Topic Modeling Approach For Code Clone Detection, Mohammed Salman Khan

UNF Graduate Theses and Dissertations

In this thesis work, the potential benefits of Latent Dirichlet Allocation (LDA) as a technique for code clone detection has been described. The objective is to propose a language-independent, effective, and scalable approach for identifying similar code fragments in relatively large software systems. The main assumption is that the latent topic structure of software artifacts gives an indication of the presence of code clones. It can be hypothesized that artifacts with similar topic distributions contain duplicated code fragments and to prove this hypothesis, an experimental investigation using multiple datasets from various application domains were conducted. In addition, CloneTM, an LDA-based …