UPM Institutional Repository

XAIRF-WFP: a novel XAI-based random forest classifier for advanced email spam detection


Citation

Bouke, Mohamed Aly and Alramli, Omar Imhemed and Abdullah, Azizol (2024) XAIRF-WFP: a novel XAI-based random forest classifier for advanced email spam detection. International Journal of Information Security, 24. art. no. 5. pp. 1-19. ISSN 1615-5262; eISSN: 1615-5270

Abstract

Spam detection is a critical cybersecurity and information management task with significant implications for security decision-making processes. Traditional machine learning algorithms such as Logistic Regression (LR), K-Nearest Neighbors (KNN), Decision Trees (DT), and Support Vector Machines (SVM) have been employed to mitigate this challenge. However, these algorithms often suffer from the "black box" dilemma, a lack of transparency that hinders their applicability in security contexts where understanding the reasoning behind classifications is essential for effective risk assessment and mitigation strategies. To address this limitation, the current paper leverages Explainable Artificial Intelligence (XAI) principles to introduce a novel, more transparent approach to spam detection. This paper presents a novel approach to spam detection using a Random Forest (RF) Classifier model enhanced by a meticulously designed methodology. The methodology incorporates data balancing through Hybrid Random Sampling, feature selection using the Gini Index, and a two-layer model explainability via Model-agnostic Explanations (LIME) and Shapley Additive Explanations (SHAP) techniques. The model achieved an impressive accuracy rate of 94.8% and high precision and recall scores, outperforming traditional methods such as LR, KNN, DT, and SVM across all key performance metrics. The results affirm the effectiveness of the proposed methodology, offering a robust and interpretable model for spam detection. This study is a significant advancement in the field, providing a comprehensive and reliable solution to the spam detection problem.


Download File

[img] Text
118711.pdf - Published Version
Restricted to Repository staff only

Download (663kB)

Additional Metadata

Item Type: Article
Divisions: Faculty of Computer Science and Information Technology
DOI Number: https://doi.org/10.1007/s10207-024-00920-1
Publisher: Springer Science and Business Media Deutschland GmbH
Keywords: Data balancing; Explainable artificial intelligence (XAI); Model explainability; Random forest classifier; Spam detection
Depositing User: Mohamad Jefri Mohamed Fauzi
Date Deposited: 22 Jul 2025 07:20
Last Modified: 22 Jul 2025 07:20
Altmetrics: http://www.altmetric.com/details.php?domain=psasir.upm.edu.my&doi=10.1007/s10207-024-00920-1
URI: http://psasir.upm.edu.my/id/eprint/118711
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item