تجاوز إلى المحتوى الرئيسي
User Image

فهد الدبيان

Lecturer

عضو هيئة التدريس بقسم هندسة البرمجيات

علوم الحاسب والمعلومات
Building 31, Office 2029
المنشورات
مقال فى مجلة
2024

The Impact of Hard and Easy Negative Training Data on Vulnerability Prediction Performance

Software vulnerability prediction Vulnerability datasets Machine learning

Vulnerability prediction models have been shown to perform poorly in the real world. We examine how the composition of negative training data influences vulnerability prediction model performance. Inspired by other disciplines (e.g. image processing), we focus on whether distinguishing between negative training data that is ‘easy’ to recognise from positive data (very different from positive data) and negative training data that is ‘hard’ to recognise from positive data (very similar to positive data) impacts on vulnerability prediction performance. We use a range of popular machine learning algorithms, including deep learning, to build models based on vulnerability patch data curated by Reis and Abreu, as well as the MSR dataset. Our results suggest that models trained on higher ratios of easy negatives perform better, plateauing at 15 easy negatives per positive instance. We also report that different ML algorithms work better based on the negative sample used. Overall, we found that the negative sampling approach used significantly impacts model performance, potentially leading to overly optimistic results. The ratio of ‘easy’ versus ‘hard’ negative training data should be explicitly considered when building vulnerability prediction models for the real world.

اسم الناشر
Journal of Systems and Software
رقم المجلد
211
الصفحات
112003
مزيد من المنشورات
publications

Vulnerability prediction models have been shown to perform poorly in the real world. We examine how the composition of negative training data influences vulnerability prediction model performance…

بواسطة Fahad Al Debeyan, Lech Madeyski, Tracy Hall, David Bowes
2024
تم النشر فى:
Journal of Systems and Software
publications

The recent emergence of the Log4jshell vulnerability demonstrates the importance of detecting code vulnerabilities in software systems. Software Vulnerability Prediction Models (…

بواسطة Fahad Al Debeyan, Tracy Hall, David Bowes
2022
تم النشر فى:
PROMISE 2022: Proceedings of the 18th International Conference on Predictive Models and Data Analytics in Software Engineering
publications

Regular expression matching tools (grep) match regular expressions to lines of text. However, because of the complexity that regular expressions can reach, it is challenging to apply state of the…

بواسطة Fahad Aldebeyan
2018