Citation
Dehaki, Ghazaleh Babanejad
(2020)
Dominance relationship-based skyline query framework over dynamic and incomplete database.
Doctoral thesis, Universiti Putra Malaysia.
Abstract
Skyline queries rely on the notion of Pareto dominance, filter the data items by keeping
only those data items that are the best, most preferred, also known as skylines, from a
database to meet the user’s preferences. Skyline query has been studied extensively
and a significant number of skyline algorithms have been proposed, mostly attempt to
resolve the optimisation problem that is mainly associated with a reduction in the
processing time of skyline computations. In today’s era, the presence of incomplete
data in a database is inevitable. The skyline algorithms in such situation will have to
deal with several issues besides the optimisation problem. The missing values in
databases give a negative influence on the number of pairwise comparisons that needs
to be performed between the data items. Moreover, the transitivity property of skylines
is no longer hold. Cyclic dominance is another issue that needs to be tackled as it
yields empty skyline results. Furthermore, databases are dynamic in nature in which
their states change throughout the time. These changes are necessary as databases must
reflect the current and latest information of the applications. The changes are normally
achieved through data manipulation operations and data definition operations. The
skylines derived before changes are made towards the initial database are no longer
valid in the new state of the database. Utilising the existing skyline algorithms would
require performing the algorithms on the new state of the database. However,
computing the skylines over the entire database after changes are made is inefficient
as not all the data items are affected by the changes.
In tackling the above stated issues, we propose a solution, named DyIn-Skyline, which
consists of three main phases, namely: Phase I – processing skyline queries over the
initial incomplete database, Phase II – processing skyline queries over a dynamic and
incomplete database, in which the changing state of the database is due to a data
manipulation operation(s) (insert, delete or update a data item(s)), and Phase III –
processing skyline queries over a dynamic and incomplete database, in which the changing state of the database is due to a data definition operation(s) (add or remove
a dimension(s)). For each phase, a framework is proposed. The proposed framework
in the Phase I consists of three main components, namely: Data Grouping Builder
(DGB), Bucket Skyline Identifier (BSI), and Final Skyline Identifier (FSI). We have
also introduced and designed three lists, namely: Bucket Dominating (BDG), Bucket
Dominated (BDD), and Domination History (DH) to keep track of the dominating data
items, dominated data items, and dominance relationships, respectively; this
information is useful and is utilised by the Phase II and Phase III of the DyIn-Skyline
solution. The framework of Phase II consists of three components, namely: Skyline-
Insert Identifier (S-II), which derives a set of skylines after a data item(s) is inserted
into a database, Skyline-Delete Identifier (S-DI), which derives a set of skylines after
an existing data item(s) is deleted from a database, and Skyline-Update Identifier (SUI),
which produces a set of skylines after an existing data item(s) of a database is
updated. Meanwhile, the framework of Phase III consists of two components, namely:
Skyline-Add Dimension Analyser (S-ADA) which derives a set of skylines after a new
dimension(s) is added to a database and Skyline-Remove Dimension Analyser (S-RDA)
which derives a set of skylines after an existing dimension(s) is removed from a
database.
Extensive experiments have been conducted to evaluate the performance and prove
the efficiency of our proposed solution, DyIn-Skyline, in processing skyline queries
over a dynamic and incomplete database. The performance results of DyIn-Skyline are
compared to other existing works that are the closest to this research, namely: ISkyline,
SIDS, and Incoskyline. In most cases, DyIn-Skyline shows a steady performance and
achieves better performance with regard to the number of pairwise comparisons and
processing time compared to the previous works. Unlike ISkyline, SIDS, and
Incoskyline which derive skylines over the entire database after changes are made
towards the database, i.e. the new state of the database, DyIn-Skyline avoids
unnecessary skyline computations. It relies on the information saved in the following
lists: Bucket Dominating (BDG), Bucket Dominated (BDD), and Domination History
(DH) and focuses only on those data items that are affected by the changes.
Download File
Additional Metadata
Actions (login required)
|
View Item |