A strategy for extracting information from semi-structured web pages.

Citation

Shaker, Mahmoud and Ibrahim, Hamidah and Mustapha, Aida and Abdullah, Lili Nurliyana (2010) A strategy for extracting information from semi-structured web pages. International Journal of Web Information Systems , 6 (4). pp. 304-318. ISSN 1744-0084

Abstract

Purpose – The aim of this paper is to propose a strategy for extracting information from web tables. Design/methodology/approach – The paper presents a strategy for extracting information from web tables of semi-structured web pages (WPs) by handling the issue of synonym which emerges as these WPs have been designed and created without referring to any standards or guidelines. Findings – The paper finds that this strategy extracts information with high precision, and extracts the attributes besides the sub-attributes that describe the extracted attributes and values of the sub-attributes. Practical implications – Experiment conducted on the Nokia products domain demonstrated that the proposed strategy extracts information from web tables with high precision which is 98.98 percent. Originality/value – This paper contributes to the research on extracting information.

Download File

Full text not available from this repository.

Additional Metadata

Item Type:	Article
Subject:	Information retrieval.
Subject:	Text processing (Computer science).
Divisions:	Faculty of Computer Science and Information Technology
DOI Number:	https://doi.org/10.1108/17440081011090239
Keywords:	Data handling; Information retrieval; Internet.
Depositing User:	Umikalthom Abdullah
Date Deposited:	27 Jan 2012 01:25
Last Modified:	27 Jan 2012 01:25
Altmetrics:	http://www.altmetric.com/details.php?domain=psasir.upm.edu.my&doi=10.1108/17440081011090239
URI:	http://psasir.upm.edu.my/id/eprint/12868
Statistic Details:	View Download Statistic

Actions (login required)

View Item