UPM Institutional Repository

Enhance efficiency of answering XML keyword query using incompact structure of MCCTree


Citation

Sazaly, Ummu Sulaim (2012) Enhance efficiency of answering XML keyword query using incompact structure of MCCTree. Masters thesis, Universiti Putra Malaysia.

Abstract

People nowadays live in cyber life where everything can be done by just typing through keyboard and system will complete the process. As the interaction is done through online, data sharing is the most important service to send and deliver information. Extended Markup Language (XML) has been chosen as the most important data sharing medium as it is very friendly for human and machine to interpret. Due to the importance of it, many studies have been done to increase the effectiveness of retrieving information from XML file. Many notions and techniques have been introduced especially to process query of information. Compact Lowest Common Ancestor (CLCA) and Maximal Compact Lowest Common Ancestor (MCLCA) implemented in algorithms named CGTreeGenerator and MCCTreeGenerator has been proven in returning an accurate result in answering XML keyword query. CGTreeGenerator compacted the XML tree by eliminating irrelevant nodes based on CLCA notion, which produced Compact Global Tree (CGTree). MCCTreeGenerator used CGTree to select subtree called Maximal Compact Connected Tree (MCCTree) as query result based on MCLCA notion. However, the MCCTree cannot be used directly in its ranking method because calculation in ranking method used the structure of subtree as before it has been compacted. If the result cannot be used directly by the ranking method, the algorithm has an ineffective process. Moreover, if the ineffective process requires re-examining the original tree, the efficiency of the process of the algorithm will be reduced. This study is a response to these weaknesses. This study proposes a new algorithm, namely XMCCTreeGenerator, to enhance the efficiency of the CGTree- MCCTreeGenerator. This study identifies the effective processes needed in producing XML query result using MCLCA notion and without compacting it. Those processes constructed MCCTreeGenerator algorithm which will produce the same subtree as MCCTree but difference in its structure. This new returned subtree called Extended MCCTree(XMCCTree) can be used directly by the ranking method because it is in an incompact structure. An experiment is run using XML datasets available in XML Data Repository from University of Washington’s website. Two files are selected which consist of different data structure and divided into three ranges of size. Keywords are manually randomly selected from the files and executed between three to five numbers of keyword. Two prototypes are developed which implement CGTree-MCCTreeGenerator and XMCCTreeGenerator. Since this study focuses on efficiency of the algorithm, elapsed time for each execution is collected from the experiment. In conclusion, the proposed XMCCTreeGenerator is more efficient than the previous CGTree- MCCTreeGenerator in answering XML keyword query using MCLCA.


Download File

[img]
Preview
PDF
FSKTM 2013 3R.pdf

Download (721kB) | Preview

Additional Metadata

Item Type: Thesis (Masters)
Subject: XML (Document markup language)
Subject: Keyword searching
Call Number: FSKTM 2013 3
Chairman Supervisor: Associate Professor Mohd Hasan bin Selamat
Divisions: Faculty of Computer Science and Information Technology
Depositing User: Haridan Mohd Jais
Date Deposited: 13 Jan 2016 09:21
Last Modified: 13 Jan 2016 09:21
URI: http://psasir.upm.edu.my/id/eprint/38635
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item