Can LLMs collaboratively reason using scene graphs?
A Schema-Guided Reason-while-Retrieve framework for Reasoning on Scene Graphs with Large-Language-Models (LLMs)
This paper introduces SG-RwR, a framework for enhancing LLM reasoning on scene graphs, useful for tasks like spatial planning and question answering. It employs two cooperating LLM agents, a Reasoner and a Retriever, that communicate iteratively. Crucially, both agents operate based on the scene graph schema rather than the full graph data, which improves efficiency and reduces hallucinations. The Retriever uses this schema to generate code for dynamically querying the graph, providing targeted information to the Reasoner. The Reasoner can also write code to utilize external tools for numerical reasoning and sub-problem solving, further improving accuracy and allowing it to tackle more complex tasks. Experiments show SG-RwR outperforms baseline methods, particularly in few-shot settings and complex environments. Key to its success is the combination of iterative reasoning, a two-agent design, and the schema-guided code-writing approach.