UPM Institutional Repository

Applying semantic similarity measures to enhance topic-specific web crawling


Citation

Pesaranghader, Ali and Mustapha, Norwati and Pesaranghader, Ahmad (2013) Applying semantic similarity measures to enhance topic-specific web crawling. In: 2013 13th International Conference on Intelligent Systems Design and Applications (ISDA), 8-10 Dec. 2013, Bangi, Selangor, Malaysia. (pp. 205-212).

Abstract

As the Internet grows rapidly, finding desirable information becomes a tedious and time consuming task. Topic-specific web crawlers, as utopian solutions, tackle this issue through traversing the Web and collecting information related to the topic of interest. In this regard, various methods are proposed. Nevertheless, they hardly consider desired sense of the given topic which would certainly play an important role to find relevant web pages. In this paper, we attempt to improve topic-specific web crawling by disambiguating the sense of the topic. This would avoid crawling irrelevant links interlaced with other senses of the topic. For this purpose, by considering links hypertext semantic, we employ Lin semantic similarity measure in our crawler, named LinCrawler, to distinguish topic sense-related links from the others. Moreover, we compare LinCrawler against TFCrawler which only considers frequency of terms in hypertexts. Experimental results show LinCrawler outperforms TFCrawler to collect more relevant web pages.


Download File

Full text not available from this repository.

Additional Metadata

Item Type: Conference or Workshop Item (Paper)
Divisions: Faculty of Computer Science and Information Technology
DOI Number: https://doi.org/10.1109/ISDA.2013.6920736
Publisher: IEEE (IEEEXplore)
Keywords: Topic-specific web crawling; Link prediction; Information retrieval; Web data mining; Semantic web
Depositing User: Nursyafinaz Mohd Noh
Date Deposited: 03 Nov 2015 08:41
Last Modified: 03 Nov 2015 08:41
Altmetrics: http://www.altmetric.com/details.php?domain=psasir.upm.edu.my&doi=10.1109/ISDA.2013.6920736
URI: http://psasir.upm.edu.my/id/eprint/41318
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item