How can agents improve VLMs without bigger models?
AIDE: Agentically Improve Visual Language Model with Domain Experts
This paper introduces AIDE (Agentic Improvement through Domain Experts), a framework for improving Visual Language Models (VLMs) by leveraging specialized external expert models (like OCR or object detection systems) instead of relying on larger, more general models for knowledge distillation. AIDE utilizes two agents: a "Selector" that identifies areas for improvement and chooses relevant expert models, and a "Synthesizer" that integrates expert outputs with existing data to create enhanced training examples. This multi-agent approach offers a more scalable and efficient way to improve VLMs, especially when larger teacher models are unavailable. Key for LLM-based multi-agent systems is the demonstration of how independent agents with specialized skills (represented by the expert models) can collaboratively contribute to the improvement of a central LLM (the VLM). The "Selector" agent embodies decision-making within the multi-agent system. This framework opens possibilities for similar agent-based improvement in other LLM applications.