How to speed up LLM communication?
DroidSpeak: Enhancing cross-LLM communication
November 6, 2024
https://arxiv.org/pdf/2411.02820DroidSpeak enhances communication speed between large language model (LLM) agents, particularly fine-tuned versions of the same base model. It leverages the similarities in these models by selectively reusing intermediate computation results (embedding and key-value caches) from the sender LLM, reducing redundant processing on the receiver's end and significantly speeding up interactions without substantial accuracy loss. This addresses the bottleneck of prefill latency, which dominates communication time in multi-agent LLM systems.