New Learning Models for Generating Classification Rules Based on Rough Set Approach
Al Shalabi, Luai Abdel Lateef (2000) New Learning Models for Generating Classification Rules Based on Rough Set Approach. PhD thesis, Universiti Putra Malaysia.
Data sets, static or dynamic, are very important and useful for presenting real life features in different aspects of industry, medicine, economy, and others. Recently, different models were used to generate knowledge from vague and uncertain data sets such as induction decision tree, neural network, fuzzy logic, genetic algorithm, rough set theory, and others. All of these models take long time to learn for a huge and dynamic data set. Thus, the challenge is how to develop an efficient model that can decrease the learning time without affecting the quality of the generated classification rules. Huge information systems or data sets usually have some missing values due to unavailable data that affect the quality of the generated classification rules. Missing values lead to the difficulty of extracting useful information from that data set. Another challenge is how to solve the problem of missing data. Rough set theory is a new mathematical tool to deal with vagueness and uncertainty. It is a useful approach for uncovering classificatory knowledge and building a classification rules. So, the application of the theory as part of the learning models was proposed in this thesis. Two different models for learning in data sets were proposed based on two different reduction algorithms. The split-condition-merge-reduct algorithm ( SCMR) was performed on three different modules: partitioning the data set vertically into subsets, applying rough set concepts of reduction to each subset, and merging the reducts of all subsets to form the best reduct. The enhanced-split-condition-merge-reduct algorithm (E SCMR) was performed on the above three modules followed by another module that applies the rough set reduction concept again to the reduct generated by SCMR in order to generate the best reduct, which plays the same role as if all attributes in this subset existed. Classification rules were generated based on the best reduct. For the problem of missing data, a new approach was proposed based on data partitioning and function mode. In this new approach, the data set was partitioned horizontally into different subsets. All objects in each subset of data were described by only one classification value. The mode function was applied to each subset of data that has missing values in order to find the most frequently occurring value in each attribute. Missing values in that attribute were replaced by the mode value. The proposed approach for missing values produced better results compared to other approaches. Also, the proposed models for learning in data sets generated the classification rules faster than other methods. The accuracy of the classification rules by the proposed models was high compared to other models.
Repository Staff Only: Edit item detail