Application of Optimization Methods for Solving Clustering and Classification Problems

Citation

Shabanzadeh, Parvaneh (2011) Application of Optimization Methods for Solving Clustering and Classification Problems. PhD thesis, Universiti Putra Malaysia.

Abstract

Cluster and classification analysis are very interesting data mining topics that can be applied in many fields. Clustering includes the identification of subsets of the data that are similar. Intuitively, samples within a valid cluster are more similar to each other than they are to a sample belonging to a different cluster. Samples in the same cluster have the same label. The aim of data classification is to set up rules for the classification of some observations that the classes of data are supposed to be known. Here, there is a collection of classes with labels and the problem is to label a new observation or data point belonging to one or more classes of data. The focus of this thesis is on solvingclustering and classification problems. Specifically, we will focus on new optimization methods for solving clustering and classification problems. First we briefly give some data analysis background. Then a review of different methods currently available that can be used to solve clustering and classification problems is also given. Clustering problem is discussed as a problem of non-smooth, non-convex optimization and a new method for solving this optimization problem is developed. This optimization problem has a number of characteristics that make it challenging: it has many local minimum, the optimization variables can be either continuous or categorical, and there are no exact analytical derivatives. In this study we show how to apply a particular class of optimization methods known as pattern search methods to address these challenges. This method does not explicitly use derivatives, and is particularly appropriate when functions are non-smooth. Also a new algorithm for finding the initial point is proposed. We have established that our proposed method can produce excellent results compared to those previously known methods. Results of computational experiments on real data sets present the robustness and advantage of the new method. Next the problem of data classification is studied as a problem of global, non-smooth and non-convex optimization; this approach consists of describing clusters for the given training sets. The data vectors are assigned to the closest cluster and correspondingly to the set, which contains this cluster and an algorithm based on a derivative-free method is applied to the solution of this problem. The proposed method has been tested on real-world datasets. Results of numerical experiments have been presented which demonstrate the effectiveness of the proposed algorithm.

Download File

Preview

PDF
IPM_2011_3.pdf
Download (213kB) | Preview

Additional Metadata

Item Type:	Thesis (PhD)
Subject:	Mathematical optimization
Subject:	Cluster analysis
Subject:	Discriminant analysis
Call Number:	IPM 2011 3
Chairman Supervisor:	Professor. Malik Hj. Abu Hassan, PhD
Divisions:	Institute for Mathematical Research
Depositing User:	Najwani Amir Sariffudin
Date Deposited:	15 May 2014 03:32
Last Modified:	15 May 2014 03:32
URI:	http://psasir.upm.edu.my/id/eprint/19691
Statistic Details:	View Download Statistic

Actions (login required)

View Item