Citation
Al Shalabi, Luai Abdel Lateef
(2000)
New Learning Models for Generating Classification Rules Based on Rough Set Approach.
Doctoral thesis, Universiti Putra Malaysia.
Abstract
Data sets, static or dynamic, are very important and useful for presenting real life
features in different aspects of industry, medicine, economy, and others. Recently,
different models were used to generate knowledge from vague and uncertain data
sets such as induction decision tree, neural network, fuzzy logic, genetic algorithm,
rough set theory, and others. All of these models take long time to learn for a huge
and dynamic data set. Thus, the challenge is how to develop an efficient model that
can decrease the learning time without affecting the quality of the generated
classification rules. Huge information systems or data sets usually have some
missing values due to unavailable data that affect the quality of the generated
classification rules. Missing values lead to the difficulty of extracting useful
information from that data set. Another challenge is how to solve the problem of
missing data. Rough set theory is a new mathematical tool to deal with vagueness and uncertainty.
It is a useful approach for uncovering classificatory knowledge and building a
classification rules. So, the application of the theory as part of the learning models
was proposed in this thesis.
Two different models for learning in data sets were proposed based on two different
reduction algorithms. The split-condition-merge-reduct algorithm ( SCMR) was
performed on three different modules: partitioning the data set vertically into subsets,
applying rough set concepts of reduction to each subset, and merging the reducts of
all subsets to form the best reduct. The enhanced-split-condition-merge-reduct
algorithm (E SCMR) was performed on the above three modules followed by another
module that applies the rough set reduction concept again to the reduct generated by
SCMR in order to generate the best reduct, which plays the same role as if all
attributes in this subset existed. Classification rules were generated based on the best
reduct.
For the problem of missing data, a new approach was proposed based on data
partitioning and function mode. In this new approach, the data set was partitioned
horizontally into different subsets. All objects in each subset of data were described
by only one classification value. The mode function was applied to each subset of
data that has missing values in order to find the most frequently occurring value in
each attribute. Missing values in that attribute were replaced by the mode value.
The proposed approach for missing values produced better results compared to other
approaches. Also, the proposed models for learning in data sets generated the classification rules faster than other methods. The accuracy of the classification rules
by the proposed models was high compared to other models.
Download File
Additional Metadata
Actions (login required)
|
View Item |