UPM Institutional Repository

Integrated approach for improving cross-project software defect prediction performance


Citation

Bala, Yahaya Zakariyau (2024) Integrated approach for improving cross-project software defect prediction performance. Doctoral thesis, Universiti Putra Malaysia.

Abstract

This research addresses three critical challenges in cross-project defect prediction (CPDP): distribution differences, redundant features, and model overfitting. These issues often degrade prediction accuracy and robustness in various domains. To tackle these challenges, this study proposes a holistic approach named Transformation, Feature Selection, and Multi-learning (TFSM). This research is divided into three objectives: firstly, to proposed transformation, feature selection and multi-learning techniques that can mitigate distribution differences between datasets, identify and eliminate redundant features and combat model overfitting, respectively. Secondly, to integrate these techniques into a TFSM and implement. Thirdly, to evaluate each technique and the integrated approach. The research methodology involves the formulation, implementation, and evaluation of each technique individually and their integrated approach, TFSM. Experimental evaluations are conducted using open-source software projects sourced from the open source repository, with F1_score serving as the primary evaluation metric. Results from the experiments demonstrate significant improvements in predictive performance. The transformation techniques effectively reduce distribution differences, enhancing the model's ability to generalize across diverse datasets. Feature selection methods successfully mitigate the negative impact of redundant features, streamlining the learning process and improving model interpretability. Additionally, the multi-learning approach proves effective in reducing model overfitting by aggregating diverse model outputs. When integrated into the TFSM approach, these techniques collectively demonstrated a marked improvement in CPDP performance. The TFSM approach leverages the strengths of each individual technique, resulting in a synergistic effect that enhances the model’s predictive accuracy. This approach addresses the multifaceted challenges inherent in CPDP, providing a more reliable and effective solution for defect prediction in software projects. This work contributes to the ongoing efforts in the software engineering community to develop more accurate and reliable defect prediction models, ultimately aiding in the development of higher-quality software. Future work will focus on further refining these techniques and exploring their applicability to a broader range of software projects and repositories.


Download File

[img] Text
120163.pdf

Download (880kB)
Official URL or Download Paper: http://ethesis.upm.edu.my/id/eprint/18501

Additional Metadata

Item Type: Thesis (Doctoral)
Subject: Software engineering
Subject: Computer software -Testing
Subject: Machine learning
Call Number: FSKTM 2024 15
Chairman Supervisor: Pathiah binti Abdul Samat, PhD
Divisions: Faculty of Computer Science and Information Technology
Keywords: Cross-Project, Defect, Machine Learning, Prediction, Software
Depositing User: Ms. Rohana Alias
Date Deposited: 09 Oct 2025 08:38
Last Modified: 09 Oct 2025 08:38
URI: http://psasir.upm.edu.my/id/eprint/120163
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item