Citation
Sofian, Hazrina
(2018)
Adaptive model for semantic question answering disambiguation over linked data.
Doctoral thesis, Universiti Putra Malaysia.
Abstract
Semantic Question Answering (SQA) accepts natural language question (NL) from
users and presents the exact answer retrieved from the linked data. It requires three
disambiguations which are NL question disambiguation, linked data environment
disambiguation and multi-types of word disambiguation. Firstly, the NL
disambiguation involves the disambiguation of three meta-mapping aspects which
are the variation of question pattern, question complexity and linguistic
terminologies of NL questions posed by users. Secondly, the linked data
disambiguation involves the disambiguation of another four meta-mapping aspects
which are the variation of datatype, resource heterogeneity, knowledge-based (KB)
concept terminology and the variation of structure in the linked data. Thirdly, the
word disambiguation involves the disambiguation between the linguistic
terminology and the KB concept terminology. These three disambiguations are
needed to be addressed simultaneously because through empirical study that had
been carried out, this research has found that the Simple Protocol and RDF Query
Language (SPARQL) components are determined by these seven meta-mapping
aspects.
Most existing researches modify the question, manually; select only certain patterns
of NL questions or select only simple questions from the dataset. Moreover, certain
processes are semi-automated as some SQAs rely heavily on pre-determined lexicon
knowledge for word disambiguation or manually annotate mapping for the SPARQL
query constructions. However, the manual or semi-automated process is unable to
cater for new question patterns posed by users or to adapt the contents in the linked
data that is ever-changing and incrementally growing.
These motivate this research to firstly design the Adaptive-based Natural Language
Disambiguation (ANLD) model which is integrated with the Linguistic-based
SPARQL Translation Model (LBSTM), selective (Part of Speech Tagging) POS tag
extraction technique, composition of syntactic representation technique and model
matching technique to disambiguate NL questions. Next, this research designs the
Adaptive-based Linked Data Structure Disambiguation (ALID) model that is
executed if the output of the ANLD model is not able to retrieve answer from the
linked data. ALID uses component-based approach and feedback loop approach to
disambiguate linked data environment and to disambiguate the word ambiguity.
Precision, recall and f-measure are used as performance metrics to evaluate the
accuracy of the SPARQL queries which are the outputs of this research. The
accuracy is evaluated by comparing the constructed SPARQL queries with the
golden standard results provided by the dataset. These results illustrate that the
adaptive models are able to perform the three SQA disambiguation abilities
simultaneously without manual modification. These achievements empower
autonomous processing of translating NL questions to the SPARQL queries that
involves users with unpredictable style of question writings against the linked data
that is incrementally growing in terms of size and complexity.
Download File
Additional Metadata
Actions (login required)
|
View Item |