Citation
Mansir, Abubakar
(2021)
Instance matching framework for heterogeneous semantic web content over linked data environment.
Doctoral thesis, Universiti Putra Malaysia.
Abstract
Over the past decade, instance matching has been the possible method of discovering
relationships within heterogeneous Resource Description framework (RDF) based data
that can represent the same real-word entity over Linked Data environment. The
exponential growth of data being experienced in the recent times in terms of volume,
variety and velocity makes existing instance matching frameworks difficult to
effectively discover relationships and generate a matching output. These frameworks
suffer a high amount of comparisons in discovering matching attributes at initial stage
which leads to missing attributes in generating training samples, thus results to
incomplete alignment generation as matching output. Manual parameter configuration
is another problem associated to existing matching frameworks, which make them
weak in handling data with high level of heterogeneity. Another issue caused by these
problems is the time taken to generate alignment as well as maximum memory space
utilization during the process.
Effective and scalable instance matching framework is needed to improve the matching
performance. In this study, an instance matching framework is proposed to address the
identified problems to improve the ability of generating better and accurate matching
output (alignment) in a minimum running time. This framework adapted the methods
used in the benchmark studies with additional components and modifications in some
existing components to boost the matching performance. A proposed framework works
interactively with the following components: Serialisation and pre-processing,
unsupervised training set generation, property alignment and two-fold similarity
generation components.
Serialisation involves translating RDF data from of N-Triples file to Comma Separated
Value (CSV) file format while pre-processing performs basic text filter. In attribute
discovery component, potential matching attributes are discovered by clustering
attributes of matching instances into similar and non-similar clusters in order to discover potential attribute pairs for the matching. These discovered attributes serve as
input to a modified training set generation component, where training sets are generated
based on the potential attributes’ clusters. Property alignment check the irregular data
associated to the generated sets to optimise the matching performance. The last
component generates similarity with self-configuration behavior.
Experiments have been conducted to evaluate the performance of individual
components and the output of the framework as whole. The evaluation is performed on
real-world datasets provided in different Ontology Alignment Evaluation Initiative
(OAEI) campaign as benchmark data for instance matching track evaluation. The
output of each algorithm is evaluated, the results have shown that each algorithm
performs well and outperforms the existing algorithms on all test cases in terms better
output generation and effective handling of heterogeneity from different domains,
which is a necessary concern in all data-intensive problems.
A proposed framework demonstrated a significant improvement compared to the
benchmark frameworks: Agreement Maker Light (AML), RiMOM-Instance Matching
(RiMOM-IM) and Unsupervised Instance Matcher in terms of accuracy of alignment
generation in a minimum time frame with ability to accommodate increase in the size
of Linked Data (LD) in today’s web content.
Download File
Additional Metadata
Actions (login required)
|
View Item |