Case Slicing Technique for Feature Selection

A. Shiba, Omar A. (2004) Case Slicing Technique for Feature Selection. PhD thesis, Universiti Putra Malaysia.

[img] PDF
1368Kb

Abstract

One of the problems addressed by machine learning is data classification. Finding a good classification algorithm is an important component of many data mining projects. Since the 1960s, many algorithms for data classification have been proposed. Data mining researchers often use classifiers to identify important classes of objects within a data repository.This research undertakes two main tasks. The first task is to introduce slicing technique for feature subset selection. The second task is to enhance classification accuracy based on the first task, so that it can be used to classify objects or cases based on selected relevant features only. This new approach called Case Slicing Technique (CST). Applying to this technique on classification task can result in further enhancing case classification accuracy. Case Slicing Technique (CST) helps in identifying the subset of features used in computing the similarity measures needed by classification algorithms. CST was tested on nine datasets from UCI machine learning repositories and domain theories. The maximum and minimum accuracy obtained is 99% and 96% respectively, based on the evaluation approach. The most commonly used evaluation technique is called k-cross validation technique. This technique with k = 10 has been used in this thesis to evaluate the proposed approach. CST was compared to other selected classification methods based on feature subset selection such as Induction of Decision Tree Algorithm (ID3), Base Learning Algorithm K-Nearest Nighbour Algorithm (k-NN) and NaYve Bay~sA lgorithm (NB). All these approaches are implemented with RELIEF feature selection approach. The classification accuracy obtained from the CST method is compared to other selected classification methods such as Value Difference Metric (VDM), Pre-Category Feature Importance (PCF), Cross-Category Feature Importance (CCF), Instance-Based Algorithm (IB4), Decision Tree Algorithms such as Induction of Decision Tree Algorithm (ID3) and Base Learning Algorithm (C4.5), Rough Set methods such as Standard Integer Programming (SIP) and Decision Related Integer Programming (DRIP) and Neural Network methods such as the Multilayer method.

Item Type:Thesis (PhD)
Subject:Machine learning
Subject:Classification
Subject:Data mining
Chairman Supervisor:Associate Professor Hj. Md. Nasir Sulaiman, PhD
Call Number:FSKTM 2004 6
Faculty or Institute:Faculty of Computer Science and Information Technology
ID Code:5838
Deposited By: Nur Izyan Mohd Zaki
Deposited On:05 May 2010 08:55
Last Modified:27 May 2013 07:25

Repository Staff Only: item control page

Document Download Statistics

This item has been downloaded for since 05 May 2010 08:55.

View statistics for "Case Slicing Technique for Feature Selection"


Universiti Putra Malaysia Institutional Repository

Universiti Putra Malaysia Institutional Repository is an on-line digital archive that serves as a central collection and storage of scientific information and research at the Universiti Putra Malaysia.

Currently, the collections deposited in the IR consists of Master and PhD theses, Master and PhD Project Report, Journal Articles, Journal Bulletins, Conference Papers, UPM News, Newspaper Cuttings, Patents and Inaugural Lectures.

As the policy of the university does not permit users to view thesis in full text, access is only given to the first 24 pages only.