Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Software Engineering

2022

Security

Articles 1 - 1 of 1

Full-Text Articles in Physical Sciences and Mathematics

Data Quality Matters: A Case Study On Data Label Correctness For Security Bug Report Prediction, Xiaoxue Wu, Wei Zheng, Xin Xia, David Lo Jul 2022

Data Quality Matters: A Case Study On Data Label Correctness For Security Bug Report Prediction, Xiaoxue Wu, Wei Zheng, Xin Xia, David Lo

Research Collection School Of Computing and Information Systems

In the research of mining software repositories, we need to label a large amount of data to construct a predictive model. The correctness of the labels will affect the performance of a model substantially. However, limited studies have been performed to investigate the impact of mislabeled instances on a predictive model. To bridge the gap, in this article, we perform a case study on the security bug report (SBR) prediction. We found five publicly available datasets for SBR prediction contains many mislabeled instances, which lead to the poor performance of SBR prediction models of recent studies (e.g., the work of …