Building a French Stemmer Using a Dictionary of French Root Words

Idi, Fulayi (1999) Building a French Stemmer Using a Dictionary of French Root Words. Masters thesis, Universiti Putra Malaysia.

[img] PDF
1187Kb

Abstract

In this thesis, a strong French stemming algorithm based on a dictionary of French root words is developed. Four modules are observed for this purpose. The first module deals with the development of a list of French root words, and a list of affixes, that is, prefixes, suffixes, and prefix-suffix pairs. The second module removes the punctuation from words to be stemmed. It also removes stop words from the corpus to be stemmed. After this second module, words are noise-free, and this leads to the third module, that is, the stemming proper. The stemming order adopted is prefix, then suffix, and finally prefix-suffix pairs. Any word to be stemmed is first compared to a dictionary of French root words to check if it is a root word. Then, the actual stemming process is performed. The stemming algorithm constructed is tested using selected criteria, among which are inflection removal, prefix stripping and suffix stripping. For all these tests, the new French stemming algorithm performs better than the existing French stemmer, Savoy's stemmer. Tests are also carried out to check the performance of the new French stemmer in terms of understemming, overstemming, ambiguous stemming and dictionary error. The new French stemmer has fewer understemming, overstemming, and ambiguous stemming than Savoy's stemmer. However, the new stemmer has more dictionary born errors than Savoy's stemmer.

Item Type:Thesis (Masters)
Subject:French language
Chairman Supervisor:Hjh. Fatimah Ahmad, PhD
Call Number:FSKTM 1999 2
Faculty or Institute:Faculty of Computer Science and Information Technology
ID Code:9628
Deposited By: Laila Azwa Ramli
Deposited On:17 Feb 2011 06:56
Last Modified:17 Feb 2011 06:57

Repository Staff Only: Edit item detail

Document Download Statistics

This item has been downloaded for since 17 Feb 2011 06:56.

View statistics for "Building a French Stemmer Using a Dictionary of French Root Words"


Universiti Putra Malaysia Institutional Repository

Universiti Putra Malaysia Institutional Repository is an on-line digital archive that serves as a central collection and storage of scientific information and research at the Universiti Putra Malaysia.

Currently, the collections deposited in the IR consists of Master and PhD theses, Master and PhD Project Report, Journal Articles, Journal Bulletins, Conference Papers, UPM News, Newspaper Cuttings, Patents and Inaugural Lectures.

As the policy of the university does not permit users to view thesis in full text, access is only given to the first 24 pages only.