Citation
Idi, Fulayi
(1999)
Building a French Stemmer Using a Dictionary of French Root Words.
Masters thesis, Universiti Putra Malaysia.
Abstract
In this thesis, a strong French stemming algorithm based on a dictionary of
French root words is developed. Four modules are observed for this purpose.
The first module deals with the development of a list of French root words,
and a list of affixes, that is, prefixes, suffixes, and prefix-suffix pairs.
The second module removes the punctuation from words to be stemmed. It
also removes stop words from the corpus to be stemmed. After this second module,
words are noise-free, and this leads to the third module, that is, the stemming proper.
The stemming order adopted is prefix, then suffix, and finally prefix-suffix
pairs. Any word to be stemmed is first compared to a dictionary of French root words
to check if it is a root word. Then, the actual stemming process is performed. The stemming algorithm constructed is tested using selected criteria, among
which are inflection removal, prefix stripping and suffix stripping. For all these tests,
the new French stemming algorithm performs better than the existing French
stemmer, Savoy's stemmer.
Tests are also carried out to check the performance of the new French
stemmer in terms of understemming, overstemming, ambiguous stemming and
dictionary error. The new French stemmer has fewer understemming, overstemming,
and ambiguous stemming than Savoy's stemmer. However, the new stemmer has
more dictionary born errors than Savoy's stemmer.
Download File
Additional Metadata
Actions (login required)
|
View Item |