UPM Institutional Repository

Transformer-based model with CNN and CapsNets to improve Malay hate speech detection in tweets


Citation

Mustapha, Norwati and Abd Rahim, Nur Umaira (2024) Transformer-based model with CNN and CapsNets to improve Malay hate speech detection in tweets. Journal of Theoretical and Applied Information Technology, 102 (19). pp. 7091-7102. ISSN 1992-8645; eISSN: 1817-3195

Abstract

With the rise of social media, the spread of hate speech poses a significant threat to online harmony, especially within the Malay-speaking community. Existing research mainly focuses on high-resource languages like English, leaving a gap in effective HSD for low-resource languages like Malay. Even with a study done in previous research on Malay HSD, there is some room for improvement, and the lack of diverse datasets may significantly affect the system’s overall performance and generalization. Thus, this study proposes a model that uses a transformer-based model named RoBERTa integrated with CNNs and Capsule Networks. RoBERTa is very effective in handling contextual information in bidirectional ways. Experimental results demonstrate that the proposed models, which are RoBERTa, outperform other models in a new dataset in terms of F1-score and accuracy, which are 84.54% and 84.45%, respectively and also outperform the existing dataset, which is 77.67% and 77.45%, respectively. By offering an extensive architecture, this research not only advances the technological area but also tackles social problems by enabling safer online environments for Malay speaker’s communities. Additionally, this research contributes a valuable new Malay Hate Speech dataset, enriching resources for low-resource languages. The results underscore the importance of dataset diversity and advanced NLP techniques in generalizing well across different datasets, making this model practical for real-world applications. Furthermore, this study highlights the global potential of these techniques for improving HSD in other low-resource languages.


Download File

[img] Text
119423.pdf - Published Version
Restricted to Repository staff only

Download (1MB)
Official URL or Download Paper: https://www.jatit.org/volumes/hundredtwo19.php

Additional Metadata

Item Type: Article
Divisions: Faculty of Computer Science and Information Technology
Publisher: Little Lion Scientific R&D
Keywords: Hate speech detection; Transformer; Natural language processing; XLNet; Bert; Roberta
Depositing User: Ms. Nuraida Ibrahim
Date Deposited: 21 Aug 2025 07:30
Last Modified: 21 Aug 2025 07:30
URI: http://psasir.upm.edu.my/id/eprint/119423
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item