Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
Articles 1 - 1 of 1
Full-Text Articles in Physical Sciences and Mathematics
Data Quality Matters: A Case Study On Data Label Correctness For Security Bug Report Prediction, Xiaoxue Wu, Wei Zheng, Xin Xia, David Lo
Data Quality Matters: A Case Study On Data Label Correctness For Security Bug Report Prediction, Xiaoxue Wu, Wei Zheng, Xin Xia, David Lo
Research Collection School Of Computing and Information Systems
In the research of mining software repositories, we need to label a large amount of data to construct a predictive model. The correctness of the labels will affect the performance of a model substantially. However, limited studies have been performed to investigate the impact of mislabeled instances on a predictive model. To bridge the gap, in this article, we perform a case study on the security bug report (SBR) prediction. We found five publicly available datasets for SBR prediction contains many mislabeled instances, which lead to the poor performance of SBR prediction models of recent studies (e.g., the work of …