Citation
Bala, Yahaya Zakariyau and Abdul Samat, Pathiah and Sharif, Khaironi Yatim and Manshor, Noridayu
(2025)
Impact of feature set size on the performance of machine learning models in Cross-Project Defect Prediction (CPDP).
International Journal on Advanced Science, Engineering and Information Technology, 15 (4).
pp. 1353-1360.
ISSN 2088-5334; eISSN: 2460-6952
Abstract
Software defect prediction is a vital area in software engineering that helps developers detect potential faults before software is deployed. Cross-Project Defect Prediction (CPDP) is particularly valuable, as it enables the use of defect data from one project to predict errors in another, making it beneficial in cases where project-specific defect data is insufficient. However, the effectiveness of CPDP largely depends on how well the machine learning models are trained, and a key factor influencing their performance is the size of the feature set used. This study focuses on evaluating the impact of feature set size on the performance of two widely used machine learning models, Random Forest (RF) and Support Vector Machine (SVM), in the context of CPDP. We used defect datasets from the AEEEM repository, which consists of multiple real-world software projects. An outlier detection technique was applied to select the number of features in the training and testing data, ensuring a systematic analysis of their impact on model performance. The F1-score was used as the primary evaluation metric, as it provides a balance between precision and recall, making it a reliable measure of defect prediction accuracy. Our findings suggest that the size of the feature set plays a crucial role in determining the effectiveness of both RF and SVM models. Too many features introduce noise, reducing predictive accuracy, while too few cause underfitting, leading to the missed detection of defect patterns. Identifying an optimal feature set size improves model performance, providing practical insights for enhancing CPDP. Optimizing feature selection can lead to more accurate predictions, thereby aiding software maintenance and enhancing overall software quality.
Download File
Additional Metadata
Actions (login required)
 |
View Item |