How can I make my LLM GUI agent robust to varying initial states?
WorldGUI: Dynamic Testing for Comprehensive Desktop GUI Automation
This paper introduces WorldGUI, a new benchmark for testing GUI automation agents, specifically focusing on their ability to handle dynamic, real-world scenarios with varying initial states. It also proposes GUI-Thinker, a novel agent framework incorporating critical thinking principles through modules like Planner-Critic, Step-Check, and Actor-Critic. Key to LLM-based multi-agent systems are: dynamic task adaptation based on initial states and environment feedback, the use of instructional videos alongside text queries to provide richer context for complex tasks, and the integration of critical thinking modules for improved action planning, validation, and correction. WorldGUI allows for robust evaluation of these capabilities, showcasing the potential for LLMs to handle complex, realistic GUI automation tasks.