UPM Institutional Repository

Optimization of extractive Automatic Text Summarization using Decomposition-based Multi-objective Differential Evolution and parallelization


Citation

Hazmi Wahab, Muhammad Hafizul (2024) Optimization of extractive Automatic Text Summarization using Decomposition-based Multi-objective Differential Evolution and parallelization. Doctoral thesis, Universiti Putra Malaysia.

Abstract

The central challenge in Automatic Text Summarization (ATS) is efficiently generating machine-generated text summaries through optimization algorithms, a critical component for systems dealing with textual information processing. The current approach encounters a significant hurdle due to the long execution time, especially when employing complex optimization techniques alongside a computationally expensive ATS repair operator that repairs multiple candidate solutions. While the current approach yields impressive Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics for the generated summary, it struggles with inefficiencies, mainly attributed to the substantial optimization time consumed by the ATS repair operator scheme. In order to address this, a novel solution called Decomposition-based Multi-objective Differential Evolution (MODE/D) is proposed. It is built upon the foundation of Differential Evolution for Multi-objective optimization (DEMO) and the weighted sum method (WS), coupled with an innovative ATS repair operator scheme. Through experimentation on Document Understanding Conferences (DUC) datasets, the novel approach of MODE/D is validated by evaluating the results using ROUGE metrics. The outcomes are twofold: a remarkable reduction in serial execution time and a noteworthy enhancement over existing techniques in the scholarly domain, as evidenced by improved ROUGE-1, ROUGE-2, and ROUGE-L scores. The multi-core variant of MODE/D explored an alternative computational environment, which not only demonstrates stability but also achieves remarkable efficiency when static loop scheduling is employed. Notably, in a multi-core environment, parallel multi-core MODE/D attains a commendable speedup of 2 times faster than the serial version of MODE/D, with the highest efficiency peaking at an impressive 86.35% when employing 6 CPU cores. Additionally, when the input size is tripled, the parallel multi-core MODE/D achieves a 7.9 speedup with 98.98% efficiency under static scheduling. The commendable speedup achieved comes with a slight degradation in terms of ROUGE-2 metrics. However, this efficiency milestone underscores the robustness and scalability of the proposed approach, showcasing its ability to harness the computational power of multiple cores while maintaining stability in summary quality metrics, yielding 31 words per second (WPS), a 233.13% increase compared to its serial counterpart for the topic of d061j in DUC2002. Furthermore, two GPU variants of GMODE/D, namely variant I and variant II, are implemented, with both incorporating unified and non-unified memory architectures. Variant I performs sentence scoring at the outset of the accelerator region, while variant II conducts sentence scoring within the accelerator region. GMODE/D variant I with unified memory achieves a significant speedup of 18.17 compared to the serial variant when a 256 vector size is used with NVIDIA Tesla V100 as an accelerator device, resulting in a substantial increase in WPS, amounting to 215.517. Despite suffering a slight reduction in ROUGE scores, it exhibits the most stable CV values among the serial, multi-core, and many core variants. These advancements collectively propel optimization-based ATS approaches closer to real-time applications where thousands of documents could be involved, demonstrating the versatility and efficiency of the proposed MODE/D algorithm across diverse computing architectures, including multicore and many core environments.


Download File

[img] Text
120029.pdf

Download (1MB)
Official URL or Download Paper: http://ethesis.upm.edu.my/id/eprint/18497

Additional Metadata

Item Type: Thesis (Doctoral)
Subject: Optimization algorithms (Computer science)
Subject: Parallel processing (Computer science)
Subject: Natural language processing (Computer science)
Call Number: FSKTM 2024 11
Chairman Supervisor: Associate Professor Nor Asilah Wati binti Abdul Hamid, PhD
Divisions: Faculty of Computer Science and Information Technology
Keywords: Automatic Text Summarization, Document Understanding Conferences, Multi-objective Differential Evolution, Multi-objective Artificial Bee Colony.
Depositing User: Ms. Rohana Alias
Date Deposited: 09 Oct 2025 08:27
Last Modified: 09 Oct 2025 08:27
URI: http://psasir.upm.edu.my/id/eprint/120029
Statistic Details: View Download Statistic

Actions (login required)

View Item View Item