Transformation of Extracted Knowledge in Malay Unstructured Documents Into an Interrogative Structured Form

Sidi, Fatimah (2007) Transformation of Extracted Knowledge in Malay Unstructured Documents Into an Interrogative Structured Form. PhD thesis, Universiti Putra Malaysia.

[img] PDF
2981Kb

Abstract

The availability of knowledge discovery operation helps to extract valuable information and knowledge in large volumes of data in structured databases. However, a large portion of the available information is not in structured form but rather collections of text documents in unstructured format, which also implies to Malay unstructured documents. Therefore, structuring characteristics must be imposed to unstructured documents in order to transform information available in unstructured documents into knowledge. A new approach has been established to transform extracted knowledge in Malay unstructured document by identifying, organizing, and structuring them into interrogative structured form. Its architecture is developed based on the implementation of (i) interrogative knowledge identification; (ii) interrogative contextual information; and (iii) interrogative knowledge organization and structuring with Malay knowledge representation by concepts. It utilizes the Malay language corpus; interrogative theory; as well as object-oriented, ontology, and database model. The research involves system development based on architecture of the MalaylK-Ontology, which is being measured by quantitative retrieval performance using the recall and precision metrics. The development of the Retrieval lnterrogative Ontology Analysis Application is used to verify fitness of task for the functionalities and usefulness on the utilization of interrogative contextual information with color coding supplement, additional information annotation, and Malay knowledge representation by concepts. A number of experiments are carried out to quantify the accuracy of knowledge extracted. The MalaylK-Ontology is tested by using stratified random sampling drawn from various sources of Malay unstructured documents such as news, e-mails, articles, magazines, and texts from children story books. The results of the experiments have proved that the approach of MalaylK-Ontology performed well as compared to knowledge extracted manually done by an expert. The results of questionnaires evaluation on the Retrieval lnterrogative Ontology Analysis Application have shown good achievement in understanding the main point of the unstructured document easily and clearly. This is to improve better understanding the process of making sense of information into knowledge, maintaining the meaning of the information and gaining the interpretation of the identical knowledge in unstructured document which facilitate identical knowledge perceived by different people.

Item Type:Thesis (PhD)
Subject:Knowledge acquisition (Expert systems)
Subject:Databases
Chairman Supervisor:Associate Professor Hj. Mohd Hasan Selamat
Call Number:FSKTM 2007 10
Faculty or Institute:Faculty of Computer Science and Information Technology
ID Code:5887
Deposited By: Nur Izyan Mohd Zaki
Deposited On:05 May 2010 07:52
Last Modified:27 May 2013 07:25

Repository Staff Only: item control page

Document Download Statistics

This item has been downloaded for since 05 May 2010 07:52.

View statistics for "Transformation of Extracted Knowledge in Malay Unstructured Documents Into an Interrogative Structured Form "


Universiti Putra Malaysia Institutional Repository

Universiti Putra Malaysia Institutional Repository is an on-line digital archive that serves as a central collection and storage of scientific information and research at the Universiti Putra Malaysia.

Currently, the collections deposited in the IR consists of Master and PhD theses, Master and PhD Project Report, Journal Articles, Journal Bulletins, Conference Papers, UPM News, Newspaper Cuttings, Patents and Inaugural Lectures.

As the policy of the university does not permit users to view thesis in full text, access is only given to the first 24 pages only.