Can LLMs build complex Factorio factories?
Factorio Learning Environment
March 15, 2025
https://arxiv.org/pdf/2503.09617-
This paper introduces the Factorio Learning Environment (FLE), a new benchmark for evaluating the abilities of Large Language Models (LLMs) to act as agents in complex, open-ended tasks requiring planning, resource management, and automation, based on the video game Factorio.
-
Key points for LLM-based multi-agent systems:
- FLE allows for evaluating LLMs in both structured (lab-play) and unbounded (open-play) settings, providing a scalable challenge as model capabilities increase.
- Agents interact with FLE by writing and executing Python code, enabling evaluation of their program synthesis abilities within a resource-constrained simulated world.
- Initial evaluations of current LLMs reveal limitations in spatial reasoning, long-term planning, and error correction, especially as task complexity increases.
- LLMs exhibit different coding styles and debugging strategies when interacting with FLE, highlighting diverse approaches to problem-solving.
- FLE facilitates research into emergent behaviors in multi-agent systems, such as cooperation, competition, and resource allocation strategies, which is relevant for AI safety research.
- FLE provides a valuable platform for exploring how LLMs might perform in real-world resource management and automation scenarios, including potential safety risks.