Citation
Shabanzadeh, Parvaneh
(2011)
Application of Optimization Methods for Solving Clustering and Classification Problems.
PhD thesis, Universiti Putra Malaysia.
Abstract
Cluster and classification analysis are very interesting data mining topics that can be applied in many fields. Clustering includes the identification of subsets of the data that are similar. Intuitively, samples within a valid cluster are more similar to each other than they are to a sample belonging to a different cluster. Samples in the same cluster have the same label. The aim of data classification is to set up rules for the classification of
some observations that the classes of data are supposed to be known. Here, there is a collection of classes with labels and the problem is to label a new observation or data
point belonging to one or more classes of data. The focus of this thesis is on solvingclustering and classification problems. Specifically, we will focus on new optimization
methods for solving clustering and classification problems. First we briefly give some data analysis background. Then a review of different methods currently available that
can be used to solve clustering and classification problems is also given.
Clustering problem is discussed as a problem of non-smooth, non-convex optimization and a new method for solving this optimization problem is developed. This optimization
problem has a number of characteristics that make it challenging: it has many local minimum, the optimization variables can be either continuous or categorical, and there
are no exact analytical derivatives. In this study we show how to apply a particular class of optimization methods known as pattern search methods to address these challenges.
This method does not explicitly use derivatives, and is particularly appropriate when functions are non-smooth. Also a new algorithm for finding the initial point is proposed.
We have established that our proposed method can produce excellent results compared to those previously known methods. Results of computational experiments on real data
sets present the robustness and advantage of the new method. Next the problem of data classification is studied as a problem of global, non-smooth and non-convex
optimization; this approach consists of describing clusters for the given training sets.
The data vectors are assigned to the closest cluster and correspondingly to the set, which contains this cluster and an algorithm based on a derivative-free method is applied to the solution of this problem. The proposed method has been tested on real-world datasets.
Results of numerical experiments have been presented which demonstrate the effectiveness of the proposed algorithm.
Download File
Additional Metadata
Actions (login required)
|
View Item |