Can LLMs build image processing apps?
VisionCoder: Empowering Multi-Agent Auto-Programming for Image Processing with Hybrid LLMs
This research introduces VisionCoder, a novel multi-agent AI framework designed to automate the generation of Python code for image processing tasks. VisionCoder uses a hierarchical structure inspired by real-world development teams, breaking down complex projects into manageable modules and functions. This approach enables VisionCoder to leverage the strengths of LLMs for function-level code generation while mitigating their limitations in handling large-scale projects. Notably, VisionCoder incorporates a hybrid model approach, utilizing proprietary models like GPT-4 for high-level decision-making and open-source models for code generation, balancing performance and cost-efficiency. VisionCoder also integrates strategies like retrieval-augmented generation (RAG) and pair programming, further enhancing its capabilities.