UPM Institutional Repository

An approach for matching relational database schemas


Citation

Karasneh, Yaser Mohammad (2011) An approach for matching relational database schemas. Masters thesis, Universiti Putra Malaysia.

Abstract

Database schema integration aims at providing a uniform and consistent view called global schema, over a set of autonomous and heterogeneous data sources, so that data residing in different sources can be accessed as if it was in a single schema. Schema matching is the most crucial phase in schema integration that needs considerable attention as the outcomes from this phase influence the correctness and completeness of the integrated schemas (global schemas). Manually specifying schema matches is a tedious, time consuming, error-prone, and therefore expensive process, which is a growing problem given the rapidly increasing number of data sources to integrate. Thus, automating this process, which attempts to achieve faster and less labor-intensive, has been one of the main tasks in schema integration. Although several solutions have been proposed, but they are still limited, as they do not explore most of the available information related to schemas and thus affect the result of integration. This thesis presents an approach for matching heterogeneous relational databases’ schemas that utilizes most of the information related to schemas. Our solution takes into consideration both the structural and semantic heterogeneities and offers data/schema integration without user intervention. Six matchers have been introduced, namely (i) Name of the Databases’ Schemas Matcher (NDSM), (ii) Relation Schema Matcher (RSM), (iii) Attribute Name Matcher (ANM), (iv) Data Type Matcher (DTM), (v) Constraint Matcher (CM), and (vi) Instance Data Matcher (IDM). Matching the databases’ schemas based on the name of databases’ schemas, the name of relation schemas and the name of attributes are accomplished using two methods, namely: n-gram and synonym. Besides, our solution is domain independent as it does not rely on any specific rules of a particular domain and hence a predefined knowledge of the domain is not required. This thesis also shows that the produced integrated schemas (global schema) maintained the properties of the initial input schemas and also the characteristics of the relational model. Our approach achieved P with 91%, R with 84%, and F with 88% for the biomedical domain and P with 82%, R with 70%, and F with 76% for the hospital domain which is the highest percentage gained compared to when less elements are considered during the matching process.


Download File

[img] PDF
FSKTM 2011 23R.pdf
Restricted to Repository staff only

Download (814kB)

Additional Metadata

Item Type: Thesis (Masters)
Subject: Relational databases
Subject: Database management
Call Number: FSKTM 2011 23
Chairman Supervisor: Associate Professor Hamidah Ibrahim, PhD
Divisions: Faculty of Computer Science and Information Technology
Depositing User: Haridan Mohd Jais
Last Modified: 21 Nov 2013 07:43
URI: http://psasir.upm.edu.my/id/eprint/26989
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item