How to efficiently reuse learned skills in offline multi-agent RL?
Few is More: Task-Efficient Skill-Discovery for Multi-Task Offline Multi-Agent Reinforcement Learning
February 14, 2025
https://arxiv.org/pdf/2502.08985-
This paper introduces SD-CQL, a new algorithm for training AI agents in a multi-agent setting that learns general "skills" from a limited set of tasks and applies those skills to new, unseen tasks without needing to be retrained. This improves efficiency compared to existing methods which require retraining for each new task. It does this by reconstructing future observations to identify task-agnostic features and uses behavior cloning and conservative Q-learning to optimize skill usage.
-
Relevant to LLM-based multi-agent systems are SD-CQL's focus on:
- Generalization: Applying skills learned from a small number of example tasks to new, diverse situations is a crucial goal for multi-agent LLM applications.
- Offline Training: SD-CQL learns from existing data without needing to interact with a live environment, which aligns with the use of pre-existing text corpora for training LLMs.
- Scalability: SD-CQL addresses challenges arising from larger numbers of agents, which is relevant to complex LLM-based multi-agent interactions.
- Local Observation Focus: SD-CQL's emphasis on learning from local agent observations and avoiding global information aligns with the decentralized nature of many LLM agent systems.
- Skill Discovery as Feature Extraction: The skill discovery mechanism can be viewed as a form of automated feature extraction, relevant to identifying key features and patterns in the textual data LLMs process.