How best to choose LLMs for compound AI systems?
Optimizing Model Selection for Compound AI Systems
This paper explores how to choose the best large language model (LLM) for each step ("module") within a multi-step AI system (like a chatbot that first generates an answer, then refines it based on feedback). The key idea is that different LLMs might excel at different subtasks. Using a system called LLMSELECTOR, they demonstrate that carefully selecting the right LLM for each module, rather than using the same one for everything, significantly improves overall system performance (5-70% improvement in some cases). They accomplish this by using an LLM "judge" to estimate how well an LLM would perform on a given module. This iterative process leads to better model allocation and overcomes the limitations of only tuning prompts or focusing on module interactions. The research is relevant to LLM-based multi-agent systems because it provides a method for optimizing the selection of LLMs within these complex systems, leading to greater efficiency and performance.