Algorithms for moderating effect of emotional value from a cross-media data fusion perspective: a case study of Chinese dating reality shows

Citation

Zhang, Shasha and Dong, Qiming and Yasin, Megat Al Imran and Fang, Ng Chwee (2026) Algorithms for moderating effect of emotional value from a cross-media data fusion perspective: a case study of Chinese dating reality shows. Journal of Multiscale Modelling. art. no. 2640012. ISSN 1756-9737; eISSN: 1756-9745 (In Press)

Abstract

This research demonstrates a new algorithmic method of moderating emotional content within Chinese dating reality shows based on cross-media analysis, combining text, audio, video, and social media feedback. Five functional layers are the components of the model architecture. The above are the major components of a vision system: Data Acquisition, Multimodal Preprocessing, Cross-Media Feature Extraction, Emotional Value Detection and Moderation, and Interpretability and Visualization. Multimodal raw data is processed through Automatic Speech Recognition (ASR), recognition of facial emotions and voice analysis in order to align and organize inputs. Preprocessing removes noise from text data, normalizes sentiment word lists and undertakes temporal transformation. During the feature extraction part, different machine-learning models are applied: Bidirectional Encoder Representations from Transformers (BERT) or Enhanced Representation through Knowledge Integration (ERNIE) for text; Convolutional Recurrent Neural Network (CRNN), and Bidirectional Long Short-Term Memory (Bi-LSTM) for audio; and Residual Neural Network (ResNet50) and Inflated 3D Convolutional Network (I3D) for video. The Multimodal Transformer Fusion (MMTF) model uses the cross-modal attention mechanisms to combine these streams of data to produce unified emotional representations. These representations are divided into emotions such as joy, sadness, love, jealousy, conflict, and embarrassment using a Multi-Layer Perceptron (MLP) with softmax activation. To dynamically control content, a Deep Q-Network (DQN)-based Reinforcement Learning (RL) engine moderates’ scenes using the cultural standards and the resilience in the viewership. Interpretability is made easier with SHapley Additive exPlanations (SHAP), attention heatmaps and interactive dashboards. This research holds significant implications for the fields of Communication, Radio, and Television, as it enhances content moderation strategies in emotionally charged programming through intelligent cross-media data fusion.

Download File

Full text not available from this repository.

Official URL or Download Paper: https://www.worldscientific.com/doi/10.1142/S17569...

Additional Metadata

Item Type:	Article
Subject:	Modeling and Simulation
Subject:	Computer Science Applications
Divisions:	Faculty of Modern Language and Communication
DOI Number:	https://doi.org/10.1142/S1756973726400123
Publisher:	World Scientific
Keywords:	Content moderation; Cross-media data fusion; Emotion moderation; Multimodal learning; Reinforcement learning
Depositing User:	Ms. Nur Faseha Mohd Kadim
Date Deposited:	12 Mar 2026 07:25
Last Modified:	12 Mar 2026 07:25
Altmetrics:	http://www.altmetric.com/details.php?domain=psasir.upm.edu.my&doi=10.1142/S1756973726400123
URI:	http://psasir.upm.edu.my/id/eprint/123552
Statistic Details:	View Download Statistic

Actions (login required)

View Item