How can LLMs understand noisy instructions?
SIFTOM: Robust Spoken Instruction Following through Theory of Mind
September 18, 2024
https://arxiv.org/pdf/2409.10849This paper introduces SIFTOM, a system that helps robots understand spoken instructions, even when those instructions are noisy or unclear. It does this by combining speech recognition (ASR) with a "Theory of Mind" (ToM) model. The ToM allows the robot to reason about what the human wants based on the context of the task and their actions, similar to how humans understand each other's intentions.
For LLM-based multi-agent systems, SIFTOM demonstrates:
- Robustness to noisy speech: SIFTOM is more accurate at understanding instructions in noisy environments compared to systems that only use speech recognition.
- Contextual understanding: By using visual information and ToM, SIFTOM can correctly interpret ambiguous commands where traditional ASR might fail.
- Efficiency in collaboration: SIFTOM's ability to understand intentions leads to smoother and faster human-robot collaboration.