Can heterogeneous agents share RL policies privately?
FedHPD: Heterogeneous Federated Reinforcement Learning via Policy Distillation
This paper introduces FedHPD, a new approach to Federated Reinforcement Learning (FedRL) designed for heterogeneous agents (different models and training setups) operating in black-box settings (no shared internal model details). FedHPD uses policy distillation to share knowledge among agents via action probability distributions on a public dataset of states, improving both overall system performance and individual agent learning. The key points for LLM-based multi-agent systems are that FedHPD offers a method for independently trained LLMs (heterogeneous agents) to collaborate and learn from each other without needing to share their internal workings (black-box) or requiring a centrally managed, shared training environment. This is accomplished through knowledge distillation utilizing a public dataset of states, a technique adaptable for LLM outputs in various applications.