UPM Institutional Repository

Multiword phrases indexing for Malay-English cross-language information retrieval


Citation

Rais, Nurjannaton Hidayah and Abdullah, Muhamad Taufik and Abdul Kadir, Rabiah (2011) Multiword phrases indexing for Malay-English cross-language information retrieval. Information Technology Journal, 10 (8). pp. 1554-1562. ISSN 1812-5638; ESSN: 1812-5646

Abstract

Cross-Language Information Retrieval (CLIR) is the process of providing queries in one language and returning documents relevant to that query which is written in a different language. A popular approach to CLIR is to translate the query into the language of the documents being retrieved. One of the simplest and most effective methods for query translation is to perform dictionary look-up based on a bilingual dictionary. However, lack of dictionary coverage prune two problems: proper names and compound words handling. Relevance concept words consist of proper names and compound words, were applied in document and query indexing and query translation processes. We believed by using concept-based indexing and translations makes proper names and compound words translation possible. A series of experiments conducted to test the compound words and proper names translation methods in CLIR system. The best retrieval performance obtained from the combination of query translation approach-select all translations listed in the dictionary, alternative weighting scheme and proper names identification and translation. For both Malay and English document collection, these approaches outperformed query translation approach, select all translations listed in the dictionary, by 1.0 and 9%. The results show that proper names and compound words translations were important in query translation for Malay-English CLIR.


Download File

[img]
Preview
Text (Abstract)
Multiword phrases indexing for Malay-English cross-language information retrieval.pdf

Download (37kB) | Preview

Additional Metadata

Item Type: Article
Divisions: Faculty of Computer Science and Information Technology
DOI Number: https://doi.org/10.3923/itj.2011.1554.1562
Publisher: Asian Network for Scientific Information
Keywords: Concept-based IR; Cross-language information retrieval; Query translation; Bilingual dictionary; Proper names identification and translation
Depositing User: Nabilah Mustapa
Date Deposited: 12 Nov 2019 07:39
Last Modified: 12 Nov 2019 07:39
Altmetrics: http://www.altmetric.com/details.php?domain=psasir.upm.edu.my&doi=10.3923/itj.2011.1554.1562
URI: http://psasir.upm.edu.my/id/eprint/22487
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item