Citation
Mohd Aris, Teh Noranis and Chen, Ningning and Mustapha, Norwati and Zolkepli, Maslina
(2025)
Dual experience replay enhanced deep deterministic policy gradient for efficient continuous data sampling.
PLOS ONE, 20.
art. no. e0334411.
pp. 1-18.
ISSN 1932-6203
Abstract
To address the inefficiencies in sample utilization and policy instability in asynchronous distributed reinforcement learning, we propose TPDEB—a dual experience replay framework that integrates prioritized sampling and temporal diversity. While recent distributed RL systems have scaled well, they often suffer from instability and inefficient sampling under network-induced delays and stale policy updates—highlighting a gap in robust learning under asynchronous conditions. TPDEB significantly improves convergence speed and robustness by coordinating dual-buffer updates across distributed agents, offering a scalable solution to real-world continuous control tasks. TPDEB addresses these limitations through two key mechanisms: a trajectory-level prioritized replay buffer that captures temporally coherent high-value experiences, and KL-regularized learning that constrains policy drift across actors. Unlike prior approaches relying on a single experience buffer, TPDEB employs a dual-buffer strategy that combines standard and prioritized replay Buffers. This enables better trade-offs between unbiased sampling and value-driven prioritization, improving learning robustness under asynchronous actor updates. Moreover, TPDEB collects more diverse and redundant experience by scaling parallel actor replicas. Empirical evaluations on MuJoCo continuous control benchmarks demonstrate that TPDEB outperforms baseline distributed algorithms in both convergence speed and final performance, especially under constrained actor–learner bandwidth. Ablation studies validate the contribution of each component, showing that trajectory-level prioritization captures high-quality samples more effectively than step-wise methods, and KL-regularization enhances stability across asynchronous updates. These findings support TPDEB as a practical and scalable solution for distributed reinforcement learning systems.
Download File
Additional Metadata
Actions (login required)
 |
View Item |