UPM Institutional Repository

Skyline queries on data with uncertain dimensions for efficient computation


Citation

Mohd Saad, Nurul Husna (2018) Skyline queries on data with uncertain dimensions for efficient computation. Doctoral thesis, Universiti Putra Malaysia.

Abstract

The notion of skyline query is to find a set of objects that is not dominated by any other objects. Skyline query is crucial in multi-criteria decision making applications particularly in applications that generate uncertain data. Although there is a significant amount of research that has been committed for efficient skyline computation, regrettably, existing works lack on how to conduct skyline queries on uncertain data with objects represented as continuous ranges and exact values. By having data with uncertain dimensions, the dominance relation among objects with continuous ranges and exact values may not be transitive, thus, causing existing techniques for skyline queries are not applicable. The results of skyline queries are bound to be probabilistic since each object with continuous range is now associated with a probability value of it being a query answer. Furthermore, querying information within a range of search on uncertain dimensions proves to be challenging in order to determine objects with continuous ranges that satisfy the range query. Hence, this thesis focuses on efficiently extending skyline query and range skyline query processing to support data with uncertain dimensions. We define skyline queries over data with uncertain dimensions and present four methods to efficiently answer skyline queries, namely: distinctive partitioning, exact domination, range domination, and uncertain domination. We propose a twophase framework, SkyQUD, which integrates these four methods; the first phase employs efficient probability computations which are performed individually on groups of objects with exact values and continuous ranges, respectively. Meanwhile, the second phase employs more complex and expensive computations to perform dominance testing between objects from different groups. The SkyQUD framework is responsible to extract the most dominant skyline objects that meet the required threshold value. The threshold value is utilized in order to manage the quality and the size of the skyline objects reported. Next, we extend SkyQUD to support skyline with range queries on uncertain dimensions, denoted as SkyQUD-T. A method, range pruning, is proposed and incorporated before the first phase in SkyQUD to determine objects that satisfy the range query, where it bounds the probability of each object to a certain threshold value. Both frameworks have been validated through extensive experiments employing real and synthetic datasets. Several independent variables which are scalability, threshold, data distributions, and dimensionality are selected to determine their effects on two dependent variables. The effect of manipulating the independent variables is studied on the dependent variables which are number of pairwise comparisons and processing time. Through theoretical analysis and extensive experiments, we show that SkyQUD is able to effectively support skyline queries on data with uncertain dimensions and capable of handling large datasets. The performance of SkyQUD-T is studied against two naïve algorithms that are developed to reflect the best-case and worst-case scenarios. Results exhibit the evidences of the behaviour of SkyQUD-T, where the number of pairwise comparisons performed in SkyQUD-T is always within the performance of the aforementioned naïve algorithms.


Download File

[img] Text
FSKTM 2018 72 - IR.pdf

Download (2MB)

Additional Metadata

Item Type: Thesis (Doctoral)
Subject: Database management
Subject: Querying (Computer science)
Subject: Data mining
Call Number: FSKTM 2018 72
Chairman Supervisor: Professor Hamidah Ibrahim, PhD
Divisions: Faculty of Computer Science and Information Technology
Depositing User: Ms. Nur Faseha Mohd Kadim
Date Deposited: 11 Feb 2020 02:08
Last Modified: 11 Feb 2020 02:08
URI: http://psasir.upm.edu.my/id/eprint/76991
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item