UPM Institutional Repository

A framework for English and Malay cross-lingual document alignment method


Citation

Nasharuddin, Nurul Amelina and Azman, Azreen and Abdullah, Muhamad Taufik and Abdul Kadir, Rabiah (2019) A framework for English and Malay cross-lingual document alignment method. International Journal of Advanced Trends in Computer Science and Engineering, 8 (1.3). pp. 190-195. ISSN 2278-3091

Abstract

Issues of information divide in multilingual information retrieval are usually being solved by translating users’ queries to a language that the users understand. But dictionaries or other translation knowledge in some of the Asian languages are scarce. The objective of this study was to automatically align the English and Malay news documents to become a comparable corpus, which could contribute as a translation resource to improve the query translation in cross-lingual information retrieval. This study proposes a direct alignment framework by utilizing the textual features similarity of each document itself while attempting a novel approach of using the similarity of the documents sentiment in improving the effectiveness of the alignment method. The proposed sentiment-based approach outperformed existing alignment methods and improved the effectiveness in differentiating the related and unrelated documents. These aligned comparable documents can further be utilised in translation research for the English and Malay cross-lingual information retrieval tasks.


Download File

[img] Text (Abstract)
LINGUAL.pdf

Download (67kB)

Additional Metadata

Item Type: Article
Divisions: Faculty of Computer Science and Information Technology
DOI Number: https://doi.org/10.30534/ijatcse/2019/3881.32019
Publisher: The World Academy of Research in Science and Engineering
Keywords: Cross-lingual information retrieval; Document alignment; Malay language; Sentiment-based approach
Depositing User: Ms. Nuraida Ibrahim
Date Deposited: 10 Nov 2020 07:23
Last Modified: 10 Nov 2020 07:23
Altmetrics: http://www.altmetric.com/details.php?domain=psasir.upm.edu.my&doi=10.30534/ijatcse/2019/3881.32019
URI: http://psasir.upm.edu.my/id/eprint/80417
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item