Can federated actor-critic reliably learn across diverse environments?
Single-Loop Federated Actor-Critic across Heterogeneous Environments
This paper proposes Single-loop Federated Actor-Critic (SFAC), a method for training a shared reinforcement learning policy across multiple agents in different environments without directly sharing their data. The algorithm uses a two-level federated learning approach, aggregating both critic (value function) and actor (policy) updates from individual agents. Convergence analysis shows that the error is limited by the differences between environments and that learning speed improves linearly with the number of agents. This is relevant to LLM-based multi-agent systems because it offers a way to train shared or specialized LLMs in diverse, private settings, potentially improving generalization and allowing for personalization without compromising data privacy. The focus on a mixture environment, randomly selecting from the agents' environments during evaluation, is also pertinent to LLM applications where diverse user needs must be met.