How can LLMs improve Minecraft multi-agent collaboration?
TeamCraft: A Benchmark for Multi-Modal Multi-Agent Systems in Minecraft
TeamCraft introduces a Minecraft-based benchmark for evaluating multi-modal, multi-agent AI collaboration on complex tasks like building, clearing, farming, and smelting. It uses multi-modal prompts (images and text) for task specification and offers diverse environments for generalization testing. Key points for LLM-based multi-agent systems include the benchmark's focus on visual observations and inventory information for agent control, procedurally generated expert demonstrations for imitation learning, and challenges in generalization with current VLA models, particularly in decentralized control and understanding complex visual scenes. The benchmark suggests that LLMs perform better with textual grid-world representations than raw visual input, highlighting a need for improved visual processing in LLM-based multi-agent systems.