How can LLMs best collaborate in Minecraft?
Collaborating Action by Action: A Multi-agent LLM Framework for Embodied Reasoning
April 28, 2025
https://arxiv.org/pdf/2504.17950This paper explores how Large Language Models (LLMs) can work together to solve tasks in embodied environments, like the game Minecraft. Researchers introduce MINDcraft, a platform for testing multi-agent collaboration in Minecraft, and MineCollab, a benchmark with tasks like cooking, crafting, and building.
Key points for LLM-based multi-agent systems:
- Communication Bottleneck: Efficient natural language communication is the biggest hurdle for effective LLM collaboration, with performance dropping significantly when agents have to communicate detailed plans.
- Embodied Collaboration Challenges: Combining embodied reasoning (interacting with an environment) and collaboration adds complexity, as agents need to share information, coordinate actions, and manage resources in real-time.
- MINDcraft Platform: Provides a flexible and adaptable platform for experimenting with multi-agent LLM systems in a grounded environment, offering tools for instruction following, collaboration, and agent communication.
- MineCollab Benchmark: Offers specific collaborative tasks (cooking, crafting, construction) that require long-horizon planning, environmental interaction, and inter-agent communication.
- Limitations of Current Techniques: Standard LLM techniques like prompting and fine-tuning are insufficient for optimal multi-agent collaboration, highlighting the need for more advanced methods.
- SFT Dataset Generation: The research enables creation of supervised fine-tuning (SFT) data from successful collaborative runs, which can be used to improve LLM performance on multi-agent tasks.