Can experience replay stabilize MARL beyond replicator dynamics?
Experience-replay Innovative Dynamics
This paper introduces Experience-replay Innovative Dynamics (ERID), a new algorithm for multi-agent reinforcement learning (MARL) that addresses instability issues in existing methods, especially in dynamically changing environments. It uses a replay buffer of past experiences to smooth the learning process and update agent policies based on alternative dynamics, such as Brown-von Neumann-Nash (BNN) and Smith, offering better convergence to Nash Equilibria than traditional replicator dynamics-based methods.
For LLM-based multi-agent systems, ERID offers a potential solution to the challenges of non-stationarity and instability arising from concurrent learning. The use of experience replay and alternative dynamics could improve the robustness and adaptability of LLM agents in complex, dynamic interactions where time-averaged replicator dynamics struggle. This is especially relevant for applications with shifting objectives or environmental changes, providing a mechanism for continuous adaptation and improved convergence to stable solutions.