Incorporation of Contextual Retrieval and Data Fusion Approach Towards Improving The Retrieval Precision.
Alidin, Az Azrinudin (2007) Incorporation of Contextual Retrieval and Data Fusion Approach Towards Improving The Retrieval Precision. Masters thesis, Universiti Putra Malaysia.
Generally, the functionality of information retrieval (IR) could be divided into two categories where one section deals with search and retrieval while the other component concerns with the subject or content analysis. In the search and retrieval part, the IR systems present a ranked list of relevant documents depending on the user submitted query as the representation of the user's information need. The ranked list given indicates the probability of the document is relevant to the query by ordering the highest relevant document at the top position and so forth. However, queries are often formulated with simplified short words, such as "Java". These words are unable to summarise precisely the user's information need and its context, i.e. "java, programming language" or "java, the island". Consequently, the user's information need is not satisfied as the highest relevant document was not positioned accordingly or too much relevant document was presented in the ranked list. Besides, by using the simplified query made the context is not easily extractable, and in recent years there has been much research interest in contextual retrieval. Likewise IR, contextual retrieval retrieved the relevant document by using the combination of query, user context and search technology into a single framework. Furthermore, in contextual retrieval, the user's context is exploited to differentiate the relevant document that is useful at that time the requests occur. On the other hand, in order to match the queries and the document representation, different IR schemes were applied to calculate the probability. As a result, often retrieval precision is different for differing IR schemes, where dissimilar lists of relevant documents for the same query submitted are presented. Thus, data fusion approach is implemented in the IR to overcome this complication where multiple sources of results are combined. The implementation of data fusion approach in IR involves the merging of retrieval result from different IR schemes into a single unified ranked list that supposedly presents a list of high precisely relevant document. This study presents an approach to incorporate contextual retrieval and data hsion by using a one-keyword query towards improving retrieval precision. The methods to identify user context are categorised into four approaches; relevance feedback, user profiles, word-sense disambiguation and knowledge engineering. In order to extract user context and to model contextual retrieval, term-weighting scheme based on user profiles and knowledge engineering approaches for Watson scheme and word-sense disambiguation approach for Wordsieve scheme are implemented in this study. Five randomly selected documents are selected and submitted to these schemes and the user's context extracted is used to expand the initial query for retrieval process.In addition, the feasibility of adopting a data fusion approach was assessed in this study by testing two preconditions; --the efficacy and dissimilarity tests for the IR scheme candidates, as there is a possibility that the precision improvement may not be accomplished. Two queries which are Java and Jaguar, expanded by using user's context extracted by Watson and WordSieve are submitted and more than ten thousand documents are collected as the data collection for conducting the experiment. The performance of the experiment is evaluated by using three assessments; precision recall graph, precision evaluation based on document ranked and mean average precision. The data fusion experiment based on contextual retrieval results has reveals significant improvement on retrieval precision where the lowest percentage gained compared to the basic IR scheme is approximate to thirty seven percent, ten percent improvement compared to Watson and fifthteen percent improvement compared to WordSieve based on mean average precision calculation
Repository Staff Only: Edit item detail