UPM Institutional Repository

A review on building bilingual comparable corpora for resource-limited languages


Citation

Nasharuddin, Nurul Amelina and Abdullah, Muhamad Taufik and Azman, Azreen and Abdul Kadir, Rabiah (2018) A review on building bilingual comparable corpora for resource-limited languages. In: 2018 Fourth International Conference on Information Retrieval and Knowledge Management (CAMP'18), 26-28 Mar. 2018, Le Méridien Kota Kinabalu,Sabah, Malaysia. (pp. 113-118).

Abstract

Information retrieval tasks on certain Asian languages have the problem of limited knowledge resources such as the bilingual and multilingual dictionaries and corpora. Thus, there is a need to create multilingual resources for these languages. One of the ways is to automatically align document by identifying the chances that two documents are related to each other and these documents are not necessarily in one language. Multilingual corpora can then be automatically developed from these aligned documents. Numerous approaches for document alignment have been developed to date. In this paper, we gave an overview of recent progress made for bilingual and multilingual document alignments within the last 5 years. In addition, we also discussed the current progress made in developing bilingual comparable corpus especially on the Malay language, which is one of the resource-limited languages in Asia.


Download File

[img]
Preview
Text (Abstract)
A review on building bilingual comparable corpora for resource-limited languages.pdf

Download (34kB) | Preview

Additional Metadata

Item Type: Conference or Workshop Item (Paper)
Divisions: Faculty of Computer Science and Information Technology
DOI Number: https://doi.org/10.1109/INFRKM.2018.8464798
Publisher: IEEE
Keywords: Document alignment; Cross-lingual information retrieval; Comparable corpus; Malay language
Depositing User: Nabilah Mustapa
Date Deposited: 04 Jul 2019 04:48
Last Modified: 04 Jul 2019 04:48
Altmetrics: http://www.altmetric.com/details.php?domain=psasir.upm.edu.my&doi=10.1109/INFRKM.2018.8464798
URI: http://psasir.upm.edu.my/id/eprint/69531
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item