Target-conditioned Triple-Path Consistency for distributional music emotion regression

Citation

Hu, Qiong and Azmi Murad, Masrah Azrifah and Azman, Azreen and Nasharuddin, Nurul Amelina (2026) Target-conditioned Triple-Path Consistency for distributional music emotion regression. Knowledge-Based Systems, 336. art. no. 115317. pp. 1-14. ISSN 0950-7051

Abstract

Music Emotion Recognition systems require nuanced representations that capture emotional mixtures, a task where discrete tags or two-dimensional valence–arousal coordinates often fall short. We present Triple-Path Consistency (TPC), a target-conditioned training framework for learning emotion distributions from audio. Our implementation, TPCNet, employs a compact CNN–BiLSTM front-end with cross-attention and an encoder–decoder backbone supporting three coordinated paths: a prediction path generating logits from audio features, a target path decoding ground-truth distributions into hierarchical feature anchors, and a consistency path that re-encodes these anchors to enforce multi-level alignment. This triangular consistency constraint ensures semantic coherence throughout the network without requiring external teachers. We optimize Kullback–Leibler divergence for distributional labels and compare it with Mean Squared Error for valence–arousal regression. Experiments on four benchmarks—S9k, CAL500, MTG-Jamendo, and PMEmo—demonstrate competitive or state-of-the-art performance in distributional shape agreement, as measured by Concordance Correlation Coefficient and Spearman correlation. These results are achieved with the TPC backbone adding only 0.26 million trainable parameters to a compact 19.39M-parameter system, enabling lightweight deployment. Statistical significance tests confirm that TPC's advantage lies in modeling structural integrity rather than point-wise accuracy. The results establish TPC as a practical framework for affect-aware multimedia systems.

Download File

Text
123070.pdf - Published Version
Restricted to Repository staff only
Download (6MB) | Request a copy

Official URL or Download Paper: https://www.sciencedirect.com/science/article/pii/...

Additional Metadata

Item Type:	Article
Subject:	Management Information Systems
Subject:	Software
Divisions:	Faculty of Computer Science and Information Technology
DOI Number:	https://doi.org/10.1016/j.knosys.2026.115317
Publisher:	Elsevier
Keywords:	Consistency training; Label distribution learning; Music emotion recognition; Soft label distribution; Triple-path consistency
Depositing User:	MS. HADIZAH NORDIN
Date Deposited:	10 Apr 2026 07:50
Last Modified:	10 Apr 2026 07:50
Altmetrics:	http://www.altmetric.com/details.php?domain=psasir.upm.edu.my&doi=10.1016/j.knosys.2026.115317
URI:	http://psasir.upm.edu.my/id/eprint/123070
Statistic Details:	View Download Statistic

Actions (login required)

View Item