UPM Institutional Repository

A heuristic approach for finding similarity indexes of multivariate data sets


Citation

Khan, Rahim and Zakarya, Muhammad and Khan, Ayaz Ali and Ur Rahman, Izaz and Abd Rahman, Mohd Amiruddin and Abdul Karim, Muhammad Khalis and Mustafa, Mohd Shafie (2020) A heuristic approach for finding similarity indexes of multivariate data sets. IEEE Access, 8. 21759 - 21769. ISSN 2169-3536

Abstract

Multivariate data sets (MDSs), with enormous size and certain ratio of noise/outliers, are generated routinely in various application domains. A major issue, tightly coupled with these MDSs, is how to compute their similarity indexes with available resources in presence of noise/outliers - which is addressed with the development of both classical and non-metric based approaches. However, classical techniques are sensitive to outliers and most of the non-classical approaches are either problem/application specific or overlay complex. Therefore, the development of an efficient and reliable algorithm for MDSs, with minimum time and space complexity, is highly encouraged by the research community. In this paper, a non-metric based similarity measure algorithm, for MDSs, is presented that solves the aforementioned issues, particularly, noise and computational time, successfully. This technique finds the similarity indexes of noisy MDSs, of both equal and variable sizes, through utilizing minimum possible resources i.e., space and time. Experiments were conducted with both benchmark and real time MDSs for evaluating the proposed algorithm`s performance against its rival algorithms, which are traditional dynamic programming based and sequential similarity measure algorithms. Experimental results show that the proposed scheme performs exceptionally well, in terms of time and space, than its counterpart algorithms and effectively tolerates a considerable portion of noisy data.


Download File

[img] Text (Abstract)
ABSTRACT.pdf

Download (5kB)
Official URL or Download Paper: https://ieeexplore.ieee.org/document/8963981

Additional Metadata

Item Type: Article
Divisions: Faculty of Science
DOI Number: https://doi.org/10.1109/ACCESS.2020.2968222
Publisher: Institute of Electrical and Electronics Engineers
Keywords: Similarity index; Multivariate data set; Outliers; The longest common subsequence
Depositing User: Ms. Nuraida Ibrahim
Date Deposited: 06 Jul 2022 08:17
Last Modified: 06 Jul 2022 08:17
Altmetrics: http://www.altmetric.com/details.php?domain=psasir.upm.edu.my&doi=10.1109/ACCESS.2020.2968222
URI: http://psasir.upm.edu.my/id/eprint/87601
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item