UPM Institutional Repository

Comparing two corpus-based methods for extracting paraphrases to dictionary-based method


Ho, Chuk Fong and Azmi Murad, Masrah Azrifah and Abdul Kadir, Rabiah and C. Doraisamy, Shyamala (2011) Comparing two corpus-based methods for extracting paraphrases to dictionary-based method. International Journal of Semantic Computing, 5 (2). pp. 133-178. ISSN 1793-351X; ESSN: 1793-7108


Paraphrase extraction plays an increasingly important role in language-related research and applications in areas such as information retrieval, question answering and automatic machine evaluation. Most of the existing methods extract paraphrases from different types of corpora by using syntactic-based approaches. Since a syntactic-based approach relies on the similarity of context to identify and capture paraphrases, other than paraphrases, other terms which tend to appear in a similar context such as loosely related terms and functionally similar yet unrelated terms tend to be extracted. Besides, different types of corpora suffer from different kinds of problems such as limited availability and domain biased. This paper presents a solely semantic-based paraphrase extraction model. This model collects paraphrases from multiple lexical resources and validates those paraphrases semantically in three ways; by computing domain similarity, definition similarity and word similarity. This model is benchmarked with two outstanding syntactic-based approaches. The experimental results from a manual evaluation show that the proposed model outperforms the benchmarks. The results indicate that a semantic-based approach should be applied in paraphrase extraction instead of a syntactic-based approach. The results further suggest that a hybrid of these two approaches should be applied if one targets strictly precise paraphrases.

Download File

PDF (Abstract)
Comparing two corpus-based methods for extracting paraphrases to dictionary-based method.pdf

Download (36kB) | Preview

Additional Metadata

Item Type: Article
Divisions: Faculty of Computer Science and Information Technology
DOI Number: https://doi.org/10.1142/S1793351X11001225
Publisher: World Scientific Publishing
Keywords: Paraphrase extraction; Semantics; Lexical resource; Word similarity; Sentence similarity; Domain similarity
Depositing User: Nabilah Mustapa
Date Deposited: 08 Jun 2016 09:00
Last Modified: 08 Jun 2016 09:00
Altmetrics: http://www.altmetric.com/details.php?domain=psasir.upm.edu.my&doi=10.1142/S1793351X11001225
URI: http://psasir.upm.edu.my/id/eprint/22466
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item