Citation
Mohd Saad, Nurul Husna
(2018)
Skyline queries on data with uncertain dimensions for efficient computation.
Doctoral thesis, Universiti Putra Malaysia.
Abstract
The notion of skyline query is to find a set of objects that is not dominated by
any other objects. Skyline query is crucial in multi-criteria decision making
applications particularly in applications that generate uncertain data. Although
there is a significant amount of research that has been committed for efficient
skyline computation, regrettably, existing works lack on how to conduct skyline
queries on uncertain data with objects represented as continuous ranges and
exact values. By having data with uncertain dimensions, the dominance
relation among objects with continuous ranges and exact values may not be
transitive, thus, causing existing techniques for skyline queries are not
applicable. The results of skyline queries are bound to be probabilistic since
each object with continuous range is now associated with a probability value of
it being a query answer. Furthermore, querying information within a range of
search on uncertain dimensions proves to be challenging in order to determine
objects with continuous ranges that satisfy the range query. Hence, this thesis
focuses on efficiently extending skyline query and range skyline query
processing to support data with uncertain dimensions. We define skyline
queries over data with uncertain dimensions and present four methods to
efficiently answer skyline queries, namely: distinctive partitioning, exact
domination, range domination, and uncertain domination. We propose a twophase
framework, SkyQUD, which integrates these four methods; the first
phase employs efficient probability computations which are performed
individually on groups of objects with exact values and continuous ranges,
respectively. Meanwhile, the second phase employs more complex and
expensive computations to perform dominance testing between objects from
different groups. The SkyQUD framework is responsible to extract the most
dominant skyline objects that meet the required threshold value. The threshold
value is utilized in order to manage the quality and the size of the skyline
objects reported. Next, we extend SkyQUD to support skyline with range
queries on uncertain dimensions, denoted as SkyQUD-T. A method, range
pruning, is proposed and incorporated before the first phase in SkyQUD to
determine objects that satisfy the range query, where it bounds the probability of each object to a certain threshold value. Both frameworks have been
validated through extensive experiments employing real and synthetic
datasets. Several independent variables which are scalability, threshold, data
distributions, and dimensionality are selected to determine their effects on two
dependent variables. The effect of manipulating the independent variables is
studied on the dependent variables which are number of pairwise comparisons
and processing time. Through theoretical analysis and extensive experiments,
we show that SkyQUD is able to effectively support skyline queries on data
with uncertain dimensions and capable of handling large datasets. The
performance of SkyQUD-T is studied against two naïve algorithms that are
developed to reflect the best-case and worst-case scenarios. Results exhibit
the evidences of the behaviour of SkyQUD-T, where the number of pairwise
comparisons performed in SkyQUD-T is always within the performance of the
aforementioned naïve algorithms.
Download File
Additional Metadata
Actions (login required)
|
View Item |