UPM Institutional Repository

GA-based feature subset selection in a spam/non-spam detection system


Citation

Behjat, Amir Rajabi and Mustapha, Aida and Nezamabadi-pour, Hossein and Sulaiman, Md. Nasir and Mustapha, Norwati (2012) GA-based feature subset selection in a spam/non-spam detection system. In: International Conference on Computer and Communication Engineering (ICCCE 2012), 3-5 July 2012, Kuala Lumpur, Malaysia. (pp. 675-679).

Abstract

Spam has created a significant security problem for computer users everywhere. Spammers take an advantage of defrauds to cover parts of messages that can be used for identification of spam. For instance, a spammer does not need to consume much cost and bandwidth for sending junk mails even more than one hundred emails. On the other hand, from the feature selection perspective, one of the specific problems that decrease accuracy of spam and non-spam emails classification is high data dimensionality. Therefore, the reduction of dimensionality is related to decrease the number of irrelevant features. In this paper, a genetic algorithm (GA) is applied during feature selection in effort to decrease the number of useless features in a collection of high-dimensional email body and subject. Next, a Multi-Layer Perceptron (MLP) is employed to classify features that have been selected by the GA. Using LingSpam benchmark corpora as the dataset, the experimental results showed that a GA feature selector with the MLP classifier does not only decrease the data dimensionality but increase the spam detection rate as compared against other classifiers such as SVM and Naïve Bayes.


Download File

[img]
Preview
PDF (Abstract)
GA-based feature subset selection in a spamnon-spam detection system.pdf

Download (35kB) | Preview

Additional Metadata

Item Type: Conference or Workshop Item (Paper)
Divisions: Faculty of Computer Science and Information Technology
DOI Number: https://doi.org/10.1109/ICCCE.2012.6271302
Publisher: IEEE
Keywords: Feature selection; Genetic algorithm; MLP; Spam detection
Depositing User: Nabilah Mustapa
Date Deposited: 14 Jul 2016 04:47
Last Modified: 14 Jul 2016 04:47
Altmetrics: http://www.altmetric.com/details.php?domain=psasir.upm.edu.my&doi=10.1109/ICCCE.2012.6271302
URI: http://psasir.upm.edu.my/id/eprint/47692
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item