UPM Institutional Repository

Document enrichment using semantic tags for effective XML retrieval


Citation

Abubakar, Roko and C. Doraisamy, Shyamala and Azman, Azreen and Jantan, Azrul Hazri (2013) Document enrichment using semantic tags for effective XML retrieval. International Journal of Advancements in Computing Technology, 5 (13). pp. 138-146. ISSN 2005-8039

Abstract

Using XML to mark up document contents with user-defined and self descriptive terms makes XML technology as one of the most widely used technology for information representation and exchanges over the Internet. As a result many documents are now represented and stored as XML documents on the web. Therefore, there is the need to develop precise, efficient and user-friendly search techniques. The existing systems that support Content Only (CO) queries can be categorized into three. The Lowest Common Ancestor (LCA)-based, Query structuring systems and document Structure based systems. The answers return by first group of systems are either irrelevant to user search intention or may not be meaningful or informative enough because of the restriction on the choice of the root node. The other group requires mostly the existence of data scheme for its query conversion which is not always available or complex and fast evolving. Most of the existing systems put their emphases on query side. In this paper, we focus on document side instead of query side. Our approach exploits document structure; we enriched Wikipedia XML documents text with annotated semantic tags presence in the document. The effect of enriching elements’ text content is investigated through three retrieval experiments for which only the text content of document collection differ. The results of the experiments revealed that enriching elements’ text content with the semantic tags could improve the effectiveness of CO queries.


Download File

[img]
Preview
PDF (Abstract)
Document enrichment using semantic tags for effective XML retrieval.pdf

Download (180kB) | Preview

Additional Metadata

Item Type: Article
Divisions: Faculty of Computer Science and Information Technology
Publisher: Advanced Institute of Convergence Information Technology (AICIT)
Keywords: Content-Only Query (CO); Content and Structure Only Query (CAS); XML retrieval; Annotated semantic tag
Depositing User: Ms. Nida Hidayati Ghazali
Date Deposited: 06 Feb 2015 08:02
Last Modified: 09 Sep 2015 06:07
URI: http://psasir.upm.edu.my/id/eprint/30605
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item