Citation
Garib, Sarmad Makki Mohammed
(2019)
Efficient genetic partitioning-around-medoid algorithm for clustering.
Doctoral thesis, Universiti Putra Malaysia.
Abstract
Throughout the years, considerable efforts made to tackle the clustering problem. Yet,
because of the nature of the clustering problem, finding an efficient clustering
optimization algorithm with reasonable performance is still an open challenge. In
general, genetic based clustering algorithms showed the ability to reach near global
optimal solution. These algorithms mostly built upon the partitioning k-means
clustering algorithm. Nevertheless, these algorithms either dealt with part of the issues
inherited in the k-means or in turn produce some deficiency by itself.
One of the main issues in genetic k-means based algorithms is their sensitivity to
outliers and unevenly distributed clusters due to the mean compromised computations.
Besides, these algorithms more frequently implemented with the lengthy and
redundant fixed N length integer encoding. Additionally, they used either random or
non-mathematically proved population initializations methods. Adopting the medoid
instead of the mean can enhance the efficiency. However, the complexity of the kmedoid
based algorithms in general is more than the complexity of the k-means based
algorithms. Lastly, in order to externally judge the validity of these types of clustering
algorithms, there is a need for a method to correctly and efficiently evaluate their
variant multiclass clustering results.
This study utilizes genetic algorithms based upon the medoid rather than the mean as
a centroid-selection schema to improve the clustering efficiency. It uses the compact
and unique variable K length real encoding. Accordingly, the corresponding genetic
operators are adapted to suite the medoid and to incorporate much clustering-specific
domain knowledge. The algorithm is also preceded with careful seeding using
mathematically proved to converge k-means++ algorithm. Secondly, by utilizing the
medoid property as being an actual data item in the dataset and with the aid of the proposed indexing method, the complexity of the computation is notably decreased.
Finally, an algorithm is developed for automatic evaluation of external validity
measures on the generated variant multiclass clustering results.
The experiments are divided into four interdependent sets. The first set showed the
efficiency and performance of the proposed composing techniques including: kmean++
algorithm for initialization, fitness scaling for fitness selection, the proposed
split and merge mutation operator, and choosing the medoid instead of the mean as a
centroid-selection schema. The rest sets of experiments carried out to evaluate the
proposed algorithms. Precisely, the second set revealed that the proposed genetic
medoid based algorithms with both DB and VRC fitness functions produced more
accurate results compared with the genetic means based algorithms in terms of the Fscore.
The third set affirmed that the enhancement on the proposed algorithm, which
made use of indexing method that suits the medoids, could boost the performance to
about 9 to 27 times in terms of execution time depending on the complexity of the
dataset. Finally, the fourth set dealt with the developed method for externally
evaluating the variant multi class clusters. The experiments acknowledged that the
proposed relabeling algorithm could perform better in quality than the external F-score
measure. Also it outperform in term of its complexity O(N2) compared with the
complexity O(2N) of the Adjusted Rand Index (ARI).
Download File
Additional Metadata
Actions (login required)
|
View Item |