UPM Institutional Repository

Improving the performance of TFS with ensemble learning for cross-project software defect prediction


Citation

Abdul Samat, Pathiah and Bala, Yahaya Zakariyau and Hamidi, Nur Hamizah (2025) Improving the performance of TFS with ensemble learning for cross-project software defect prediction. International Journal of Advanced Computer Science and Applications, 16 (11). pp. 214-219. ISSN 2158-107X; eISSN: 2156-5570

Abstract

Abstract—Software defect prediction (SDP) plays a key role in improving software quality by identifying defect-prone modules early in the development cycle. While within-project prediction has been widely studied, cross-project defect prediction (CPDP) remains challenging due to differences in datasets, high feature dimensionality, and poor model generalization. To address these challenges, this study enhances the Transformation and Feature Selection (TFS) approach by integrating ensemble learning techniques. Three methods, Gradient Boosting Machine (GBM), stacking, and hybridization, were explored to evaluate their effectiveness in improving CPDP performance. Experiments were conducted using the AEEEM datasets, with preprocessing steps including normalization, feature reduction, and the Synthetic Minority Oversampling Technique (SMOTE) to handle data imbalance. The models were trained on source projects and tested on separate target projects, with the F1 score used as the main evaluation metric. Results show that the TFS × Stacking model achieved the highest overall performance, with a mean F1 score of 0.963, outperforming both TFS × GBM (0.958) and TFS × Hybridization (0.920). Compared to the original TFS × Random Forest method, the stacking approach consistently provided significant improvements across all project pairs. These findings highlight the potential of combining TFS with ensemble learning to enhance defect prediction in projects with limited or no historical data. This work not only advances CPDP research but also offers practical value to software teams by enabling more accurate identification of defect-prone modules and better allocation of testing resources.


Download File

[img] Text
124677.pdf - Published Version
Available under License Creative Commons Attribution.

Download (358kB)

Additional Metadata

Item Type: Article
Subject: Computer Science (all)
Divisions: Faculty of Computer Science and Information Technology
DOI Number: https://doi.org/10.14569/ijacsa.2025.0161123
Publisher: Science and Information Organization
Keywords: Cross-project; Defect prediction; Ensemble learning; Feature selection; Software
Sustainable Development Goals (SDGs): SDG 9: Industry, Innovation and Infrastructure, SDG 16: Peace, Justice and Strong Institutions, SDG 17: Partnerships for the Goals
Depositing User: MS. HADIZAH NORDIN
Date Deposited: 21 Apr 2026 07:23
Last Modified: 21 Apr 2026 07:23
Altmetrics: http://www.altmetric.com/details.php?domain=psasir.upm.edu.my&doi=10.14569/ijacsa.2025.0161123
URI: http://psasir.upm.edu.my/id/eprint/124677
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item