Genetic Algorithm for Web Data Mining

Loo, Kevin Teow Aik (2001) Genetic Algorithm for Web Data Mining. Masters project report, Universiti Putra Malaysia.

[img] PDF
1297Kb

Abstract

The use of various search engines could influence the number of search results in the World Wide Web. Therefore, this study attempted to discover any association between the word types or the information types used to search through the World Wide Web using the available search engines. By doing so, it could assist the process of data mining for information in the World Wide Web. This study used a prototype program based on genetic algorithm to manipulate the initial set of data. Three sets of inputs were used to generate new populations based on the individual fitness. New strains of individuals from a new population were used to test the results obtained from the World Wide Web. Eight search engines used for this study were tested with two groups of words. All the eight words were used as keyword search in all the eight search engines, and the numbers of web pages returned by each search engines were collected. The total web pages based on the selected new individuals were calculated and tabulated. In order to find any association between the search word and the search engines combinations, the individuals were ranked based on the most web pages to the least according to each of the eight words. Results obtained through the creation of new populations by the prototype program showed that the average fitness of each population improves as new populations were created and new strains of individuals were created through this evolution process. The test on results obtained from the Internet showed that certain class of words could be associated by certain combination of search engines.

Item Type:Project Paper Report
Subject:Genetic algorithms
Subject:Data mining
Subject:World Wide Web
Chairman Supervisor:Dr. Md Nasir Bin Sulaiman
Call Number:FSKTM 2001 19
Faculty or Institute:Faculty of Computer Science and Information Technology
ID Code:8675
Deposited By: Nurul Hayatie Hashim
Deposited On:09 Dec 2010 02:54
Last Modified:12 Dec 2012 06:39

Repository Staff Only: Edit item detail

Document Download Statistics

This item has been downloaded for since 09 Dec 2010 02:54.

View statistics for "Genetic Algorithm for Web Data Mining"


Universiti Putra Malaysia Institutional Repository

Universiti Putra Malaysia Institutional Repository is an on-line digital archive that serves as a central collection and storage of scientific information and research at the Universiti Putra Malaysia.

Currently, the collections deposited in the IR consists of Master and PhD theses, Master and PhD Project Report, Journal Articles, Journal Bulletins, Conference Papers, UPM News, Newspaper Cuttings, Patents and Inaugural Lectures.

As the policy of the university does not permit users to view thesis in full text, access is only given to the first 24 pages only.