UPM Institutional Repository

Stylometric authorship balanced attribution prediction method


Citation

Mustafa, Tareef Kamil (2011) Stylometric authorship balanced attribution prediction method. PhD thesis, Universiti Putra Malaysia.

Abstract

Stylometric authorship attribution is one of the important approaches in the text mining field that has received growing attention due to its delicateness. This approach concerns about analyzing texts such as novels and plays written by famous authors, trying to measure their writing style by choosing some attributes that shows uniquely belong to the author, assuming that each author has a special artistic way of writing that no other author has. There are two major problems that tie up the progress in this field, which are the predictions accuracy results and the human expert judgment. The techniques that manage such predictions are either using the statistical attributes such as frequent words or the use of more sophisticated semantic techniques such as lexicons. Nonetheless, the results are still considerably less accurate. In this research, we propose a new Stylometric method known as the Stylometric authorship balanced attribution (SABA) that is able to overcome these problems with higher accuracy prediction and independent from human judgments, which means that the method does not rely on the domain experts. The new method is implemented by merging three methods, which are called the computational approach, the Winnow algorithm and the Burrows-delta method. The proposed method also uses a set of more effective attributes as compared to the frequent words method. This results in higher Stylometric prediction thus far, having more alibis for author artistic writing style for authorship recognition and prediction. The effective attributes are represented by the word pair and the trio, while both are multiple words attributes. The proposed SABA method is compared against three other methods using the computational approach, the Winnow algorithm method, and the Burrows-delta method. The results showed that the proposed method produces superior prediction accuracy and even provides a completely correct result during the final stage of the experiment.


Download File

[img]
Preview
PDF
FSKTM 2011 16R.pdf

Download (844kB) | Preview

Additional Metadata

Item Type: Thesis (PhD)
Subject: Text processing (Computer science)
Subject: Authorship - Style manuals
Subject: Prediction (Logic)
Call Number: FSKTM 2011 16
Chairman Supervisor: Norwati Mustapha, PhD
Divisions: Faculty of Computer Science and Information Technology
Depositing User: Haridan Mohd Jais
Date Deposited: 27 Feb 2014 00:53
Last Modified: 27 Feb 2014 00:53
URI: http://psasir.upm.edu.my/id/eprint/27377
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item