UPM Institutional Repository

Statistical estimators as an alternative to standard deviation in weighted Euclidean distance cluster analysis


Citation

Dalatu, Paul Inuwa and Midi, Habshah (2018) Statistical estimators as an alternative to standard deviation in weighted Euclidean distance cluster analysis. Pertanika Journal of Science & Technology, 26 (4). pp. 1823-1836. ISSN 0128-7680; ESSN: 2231-8526

Abstract

Clustering is basically one of the major sources of primary data mining tools. It makes researchers understand the natural grouping of attributes in datasets. Clustering is an unsupervised classification method with the major aim of partitioning, where objects in the same cluster are similar, and objects which belong to different clusters vary significantly, with respect to their attributes. However, the classical Standardized Euclidean distance, which uses standard deviation to down weight maximum points of the ith features on the distance clusters, has been criticized by many scholars that the method produces outliers, lack robustness, and has 0% breakdown points. It also has low efficiency in normal distribution. Therefore, to remedy the problem, we suggest two statistical estimators which have 50% breakdown points namely the Sn and Qn estimators, with 58% and 82% efficiency, respectively. The proposed methods evidently outperformed the existing methods in down weighting the maximum points of the ith features in distance-based clustering analysis.


Download File

[img]
Preview
Text
17 JST-1003-2017.pdf

Download (476kB) | Preview

Additional Metadata

Item Type: Article
Divisions: Faculty of Science
Institute for Mathematical Research
Publisher: Universiti Putra Malaysia Press
Keywords: Clustering; Estimators; K-means; Simulation; Weighted
Depositing User: Nabilah Mustapa
Date Deposited: 12 Feb 2019 07:04
Last Modified: 12 Feb 2019 07:04
URI: http://psasir.upm.edu.my/id/eprint/66312
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item