Can LLMs improve interactive motion analysis?
ChatMotion: A Multimodal Multi-Agent for Human Motion Analysis
ChatMotion is a new multi-agent framework for analyzing human motion in videos and motion-capture data. It uses multiple LLMs specialized for motion analysis and video captioning to overcome the limitations of single-LLM approaches, like inherent biases and lack of adaptability to complex user queries. Key to ChatMotion's design is its modular "MotionCore" that houses specialized tools like a multi-LLM aggregator, analyzer, and generator, coordinated by a planner and overseen by a verifier. This architecture allows ChatMotion to dynamically decompose user requests, access and combine results from multiple models, verify consistency, and refine the analysis for improved accuracy and user engagement in diverse tasks like action recognition and motion reasoning.