Monolingual and Cross-language Information Retrieval Approaches for Malay and English Language Document

Abdullah, Muhamad Taufik (2006) Monolingual and Cross-language Information Retrieval Approaches for Malay and English Language Document. PhD thesis, Universiti Putra Malaysia.

[img] PDF
2697Kb

Abstract

This thesis concerns a Malay-English monolingual and cross-language information retrieval system. It presents a pioneer work in the aspects that are important for the development of Malay-English information retrieval system. An improved Malay stemming algorithm has been developed to stem the various word forms into their common root for the purpose of indexing and retrieving of Malay documents. The new stemming approaches have been introduced for Malay language, namely Rules-Frequency-Order (RFO), Minimum-Rules-Frequency-Order (MRFO), Rules- Frequency-Application-Order (RFAO), and Rules-Application-Frequency-Order (RAFO). The performance of the new Malay stemming algorithm and approaches are tested using the first two chapters of the Malay translation of the Quranic documents. The results show that the new stemming algorithm and approaches are superior to the previous stemming algorithm and approach. The retrieval effectiveness of the stemming algorithm and approaches are then tested on the actual Quranic collection using vector space model and latent semantic indexing. The results show that there is an improvement in performance from non-stemmed Malay to stemmed Malay, and also from previous stemming algorithm to the new stemming algorithm. Since the employment of the new stemming algorithm and approaches achieved good performance results in Malay monolingual information retrieval, a Malay-English cross-language information retrieval experiment has been performed. The results again show that there is an improvement in performance from non-stemmed Malay to stemmed Malay, and from previous stemming algorithm to the new stemming algorithm. In addition, the results reveal that the new stemming in Malay has performed better than the English stemming in retrieving relevant document. The results can be a reference to forthcoming similar experiments and research for cross language testing of documents retrieval.

Item Type:Thesis (PhD)
Subject:Bilingualism - Malay
Subject:Bilingualism - English
Chairman Supervisor:Associate Professor Hajah Fatimah Dato' Ahmad, PhD
Call Number:FSKTM 2006 1
Faculty or Institute:Faculty of Computer Science and Information Technology
ID Code:5869
Deposited By: Nur Izyan Mohd Zaki
Deposited On:05 May 2010 08:07
Last Modified:27 May 2013 07:25

Repository Staff Only: Edit item detail

Document Download Statistics

This item has been downloaded for since 05 May 2010 08:07.

View statistics for "Monolingual and Cross-language Information Retrieval Approaches for Malay and English Language Document "


Universiti Putra Malaysia Institutional Repository

Universiti Putra Malaysia Institutional Repository is an on-line digital archive that serves as a central collection and storage of scientific information and research at the Universiti Putra Malaysia.

Currently, the collections deposited in the IR consists of Master and PhD theses, Master and PhD Project Report, Journal Articles, Journal Bulletins, Conference Papers, UPM News, Newspaper Cuttings, Patents and Inaugural Lectures.

As the policy of the university does not permit users to view thesis in full text, access is only given to the first 24 pages only.