Twofold Integer Programming Model for Improving Rough Set Classification Accuracy in Data Mining.

Saeed, Walid (2005) Twofold Integer Programming Model for Improving Rough Set Classification Accuracy in Data Mining. PhD thesis, Universiti Putra Malaysia.

[img] PDF
993Kb

Abstract

The fast growing size of databases has resulted in a great demand for tools capable of analyzing data with the aim of discovering new knowledge and patterns. These tools will hopefully close the gap between the steady growth of information and the escalating demand to understand and discover the value of such knowledge. These tools are known as Data Mining (DM). One aims of DM is to discover decision rules for extracting meaningful knowledge. These rules consist of conditions over attribute value pairs called the descriptions, and decision attributes. Therefore generating a good decision model or classification model is a major component in many data mining researches. The classification approach basically produces a function that maps data item into one of several predefined classes, by way of inputting training dataset and building a model of the class attribute based on the rest of the attributes.This research undertakes three main tasks. The first task is to introduce a new rough model for minimum reduct selection and default rules generation, which is known as a Twofold Integer Programming (TIP). The second task is to enhance rules accuracy based on the first task, while the third task is to classify new objects or cases. The TIP model is based on translation of the discernibility relation of a Decision System (DS) into an Integer Programming (IP) model, resolved by using the branch and bound search method in order to generate the full reduct of the DS. The TIP model is then applied to the reduct to generate the default rules, which in turn are used to classify unseen objects with a satisfying accuracy. Apart from introducing the TIP model, this research also addressed the issues of missing values, discretization and extracting minimum rules. The treatment of missing values and discretization are being carried out during the preprocessing stage. The extraction of minimum rules operation is conducted after the default rules have been generated in order to obtain the most useful discovered rules. Eight datasets from machine learning repositories and domain theories are tested by the TIP model. Total rules number, rules length and rules accuracy for the generation rules are recorded. The accuracy for rules and classification resulted from the TIP method are compared with other methods such as Standard Integer Programming (SIP) and Decision Related Integer Programming (DRIP) from Rough Set, Genetic Algorithm (GA), Johnson reducer, HoltelR method, Multiple Regression (MR), Neural Network (NN), Induction of Decision Tree Algorithm (ID3) and Base Learning Algorithm (C4.5); all other classifiers that are mostly used in the classification tasks. Based on the experiment results, the classification method using the TIP approach has successfully performed rules generation and classification tasks as required during a classification operation. The outcome of a considerably good accuracy is mainly due to the right selection of relevant attributes. This research has proven that the TIP method has shown the ability to cater for different kinds of datasets and obtained a good rough classification model with promising results as compared with other commonly used classifiers. This research opens a wide range of future work to be considered, which includes applying the proposed method in other areas such as web mining, text mining or multimedia mining; and extending the proposed approach to work in parallel computing in data mining.

Item Type:Thesis (PhD)
Subject:Programming language(Electronic computer)
Subject:Computer network resources
Chairman Supervisor:Associate Professor Hj. Md. Nasir Sulaiman, PhD
Call Number:FSKTM 2005 3
Faculty or Institute:Faculty of Computer Science and Information Technology
ID Code:5848
Deposited By: Nur Izyan Mohd Zaki
Deposited On:05 May 2010 08:44
Last Modified:27 May 2013 07:25

Repository Staff Only: item control page

Document Download Statistics

This item has been downloaded for since 05 May 2010 08:44.

View statistics for "Twofold Integer Programming Model for Improving Rough Set Classification Accuracy in Data Mining. "


Universiti Putra Malaysia Institutional Repository

Universiti Putra Malaysia Institutional Repository is an on-line digital archive that serves as a central collection and storage of scientific information and research at the Universiti Putra Malaysia.

Currently, the collections deposited in the IR consists of Master and PhD theses, Master and PhD Project Report, Journal Articles, Journal Bulletins, Conference Papers, UPM News, Newspaper Cuttings, Patents and Inaugural Lectures.

As the policy of the university does not permit users to view thesis in full text, access is only given to the first 24 pages only.