Corpus-based analysis on cross-domain experiments in classification-and-ranking generation

Aida, Mustapha and Sulaiman, Md. Nasir and Mahmod, Ramlan and Selamat, Mohd. Hasan (2010) Corpus-based analysis on cross-domain experiments in classification-and-ranking generation. Journal of Computer Science, 6 (11). pp. 1305-1312. ISSN 1549-3636

Full text not available from this repository.

Abstract

Problem statement: Overgeneration-and-ranking architecture works well in written language where sentence is the basic unit. However, in spoken language where utterance is the basic unit, the disadvantage becomes critical as spoken language also render intentions, hence short strings may be of equivalent impact. Approach: In classification-and-ranking, response was deliberately chosen from dialogue corpus rather than wholly generated, such that it allows short ungrammatical utterances as long as they satisfy the intended meaning of input utterance. Because the architecture is intention-based, it adopted an open-domain knowledge representation, whereby response utterances were semantically represented using some ontology general enough for future reuse in another domain. Results: This study presented corpus-based analysis on cross-domain experimentation using different type of corpus to validate the consistency of the response classifier that delimits the searching space for ranking. The open-domain quality for classification-an-ranking architecture was tested on two mixed-initiative, transaction dialogue corpus in theater reservation and emergency planning. Results showed consistent distribution accuracies in both classification and ranking experiment, indicating that the approach is viable for cross-domain implementations. Conclusion: The ability of a response generation system to directly learn response utterances from the domain corpus suggested the possibility to build a dialogue system by feeding the learning module with a target corpus and the system learned the response behavior directly from the training corpus.

Item Type:Article
Keyword:Corpus-based; Open-domain; Natural language generation; Dialogue systems
Subject:Computational linguistics.
Subject:Natural language processing (Computer science).
Faculty or Institute:Faculty of Computer Science and Information Technology
Publisher:Science Publications
DOI Number:10.3844/jcssp.2011.59.64
Altmetrics:http://www.altmetric.com/details.php?domain=psasir.upm.edu.my&doi=10.3844/jcssp.2011.59.64
ID Code:13804
Deposited By: Umikalthom Abdullah
Deposited On:04 Apr 2012 03:10
Last Modified:04 Apr 2012 03:10

Repository Staff Only: item control page

Document Download Statistics

This item has been downloaded for since 04 Apr 2012 03:10.

View statistics for "Corpus-based analysis on cross-domain experiments in classification-and-ranking generation"


Universiti Putra Malaysia Institutional Repository

Universiti Putra Malaysia Institutional Repository is an on-line digital archive that serves as a central collection and storage of scientific information and research at the Universiti Putra Malaysia.

Currently, the collections deposited in the IR consists of Master and PhD theses, Master and PhD Project Report, Journal Articles, Journal Bulletins, Conference Papers, UPM News, Newspaper Cuttings, Patents and Inaugural Lectures.

As the policy of the university does not permit users to view thesis in full text, access is only given to the first 24 pages only.