A strategy for extracting information from semi-structured web pages.

Shaker, Mahmoud and Ibrahim, Hamidah and Mustapha, Aida and Abdullah, Lili Nurliyana (2010) A strategy for extracting information from semi-structured web pages. International Journal of Web Information Systems , 6 (4). pp. 304-318. ISSN 1744-0084

Full text not available from this repository.

Abstract

Purpose – The aim of this paper is to propose a strategy for extracting information from web tables. Design/methodology/approach – The paper presents a strategy for extracting information from web tables of semi-structured web pages (WPs) by handling the issue of synonym which emerges as these WPs have been designed and created without referring to any standards or guidelines. Findings – The paper finds that this strategy extracts information with high precision, and extracts the attributes besides the sub-attributes that describe the extracted attributes and values of the sub-attributes. Practical implications – Experiment conducted on the Nokia products domain demonstrated that the proposed strategy extracts information from web tables with high precision which is 98.98 percent. Originality/value – This paper contributes to the research on extracting information.

Item Type:Article
Keyword:Data handling; Information retrieval; Internet.
Subject:Information retrieval.
Subject:Text processing (Computer science).
Faculty or Institute:Faculty of Computer Science and Information Technology
DOI Number:10.1108/17440081011090239
Altmetrics:http://www.altmetric.com/details.php?domain=psasir.upm.edu.my&doi=10.1108/17440081011090239
ID Code:12868
Deposited By: Umikalthom Abdullah
Deposited On:27 Jan 2012 01:25
Last Modified:27 Jan 2012 01:25

Repository Staff Only: Edit item detail

Document Download Statistics

This item has been downloaded for since 27 Jan 2012 01:25.

View statistics for "A strategy for extracting information from semi-structured web pages."


Universiti Putra Malaysia Institutional Repository

Universiti Putra Malaysia Institutional Repository is an on-line digital archive that serves as a central collection and storage of scientific information and research at the Universiti Putra Malaysia.

Currently, the collections deposited in the IR consists of Master and PhD theses, Master and PhD Project Report, Journal Articles, Journal Bulletins, Conference Papers, UPM News, Newspaper Cuttings, Patents and Inaugural Lectures.

As the policy of the university does not permit users to view thesis in full text, access is only given to the first 24 pages only.