UPM Institutional Repository

Enhanced normalization approach addressing stop-word complexity in compound-word schema labels


Citation

Hossain, Jafreen and Mohd Sani, Nor Fazlida and Affendey, Lilly Suriani and Ishak, Iskandar and Kasmiran, Khairul Azhar (2017) Enhanced normalization approach addressing stop-word complexity in compound-word schema labels. Journal of Theoretical and Applied Information Technology, 95 (12). pp. 2635-2646. ISSN 1992-8645; ESSN: 1817-3195

Abstract

An extensive review of the existing schema matching approaches discovered an area of improvement in the field of semantic schema matching. Normalization and lexical annotation methods using WordNet have been somewhat successful in general cases. However, in the presence of stop-words these approaches result in poor accuracy. Stop-words have previously been ignored in most studies resulting in false negative conclusions. This paper proposes NORMSTOP (NORMalizer of schemata having STOP-words) as an improved schema normalization approach that addresses the complexity of stop-words (e.g. ‘by’, ‘at’, ‘and,’ or’) in Compound Word (CW) schema labels. Using a combined set of WordNet features, NORMSTOP isolates these labels during the preprocessing stage and resets the base-form to a relevant WordNet term, or an annotable compound noun. When tested on the same real dataset used in the earlier approach - (NORMS or NORMalizer of Schemata), NORMSTOP shows up to 13% improvement in annotation recall measurement. This level of improvement takes the overall schema matching process another step closer to perfect accuracy; while its absence exposes a gap in expectation, especially in today’s databases, where stop-words are in abundance.


Download File

[img]
Preview
Text
Enhanced normalization approach addressing stop-word complexity .pdf

Download (72kB) | Preview

Additional Metadata

Item Type: Article
Divisions: Faculty of Computer Science and Information Technology
Publisher: Asian Research Publication Network
Keywords: Database integration; Schema matching; Data heterogeneity; Semantic schema matching; Schema label normalization; Stop-words
Depositing User: Nurul Ainie Mokhtar
Date Deposited: 10 Jan 2019 08:15
Last Modified: 10 Jan 2019 08:15
URI: http://psasir.upm.edu.my/id/eprint/61723
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item