How to architect scalable LLM apps?
A Layered Architecture for Developing and Enhancing Capabilities in Large Language Model-based Software Systems
This paper proposes a layered architecture (Model, Inference, and Application) for developing LLM-based software systems, focusing on systematically implementing and enhancing capabilities beyond the LLMs' native abilities. It emphasizes mapping desired functionalities to specific layers and components within the architecture, considering attributes like knowledge boundaries, efficiency, and control over token generation. This structured approach aims to improve the development process, promoting robustness and scalability in LLM-based applications, particularly for multi-agent systems where orchestrating multiple LLMs and tool integrations is crucial. Key points for LLM-based multi-agent systems include: using mechanism engineering to represent complex workflows, tooling for interaction with external systems, and orchestration for managing state and chaining LLMs. The framework encourages considering trade-offs between different implementation approaches at each layer, such as fine-tuning vs. retrieval augmentation for incorporating knowledge, influencing selection of technologies for specific functionalities within a multi-agent system.