Citation
Loo, Kevin Teow Aik
(2001)
Genetic Algorithm for Web Data Mining.
[Project Paper Report]
Abstract
The use of various search engines could influence the number of search results
in the World Wide Web. Therefore, this study attempted to discover any association
between the word types or the information types used to search through the World
Wide Web using the available search engines. By doing so, it could assist the process
of data mining for information in the World Wide Web.
This study used a prototype program based on genetic algorithm to
manipulate the initial set of data. Three sets of inputs were used to generate new
populations based on the individual fitness. New strains of individuals from a new
population were used to test the results obtained from the World Wide Web. Eight
search engines used for this study were tested with two groups of words. All the eight
words were used as keyword search in all the eight search engines, and the numbers
of web pages returned by each search engines were collected. The total web pages
based on the selected new individuals were calculated and tabulated. In order to find
any association between the search word and the search engines combinations, the
individuals were ranked based on the most web pages to the least according to each of
the eight words.
Results obtained through the creation of new populations by the prototype
program showed that the average fitness of each population improves as new
populations were created and new strains of individuals were created through this
evolution process. The test on results obtained from the Internet showed that certain
class of words could be associated by certain combination of search engines.
Download File
Additional Metadata
Actions (login required)
|
View Item |