Citation
Hassn, Ahmed Kadom
(2017)
Improved clustering using robust and classical principal component.
Masters thesis, Universiti Putra Malaysia.
Abstract
kmeans algorithm is a popular data clustering algorithm. kmeans clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. Finding the appropriate number of clusters for a given data set is generally a trialanderror process which made more difficult by the subjective nature of deciding what constitutes ‘correct’ clustering. When dimension of data is large it is often difficult to apply kmeans clustering algorithm since it needs lots of computational times. To remedy this problem, we propose to integrate Principal Component analysis (PCA) which is useful for dimensionality reduction of a dataset with the kmeans clustering algorithm. We call our propose method as kmeans by principal components (pc1). In this study, the kernels that are created by using the kmeans method are replaced with kernels which are created by using PCA method where the PCA method reduces the dimensionality of a data. The results of the study show that the kmeans by PCA is faster and more efficient than the classical kmeans algorithm. The classical kmeans algorithm and the kmeans by PCA algorithm are very sensitive to the presence of outlier. Hence the kmeans by robust PCA is developed to rectify the problem of outliers in the dataset. The findings indicate that in the absence of outliers, the performances of both methods; the kmeans by PCA and the kmeans by robust PCA are equally good. Nonetheless, the kmeans by robust PCA is not much affected by outliers compared to the kmeans by classical PCA.
Download File
Additional Metadata
Actions (login required)

View Item 