How can I efficiently optimize many agents' shared payoff?
Mean-Field Bayesian Optimisation
This paper introduces MF-GP-UCB, a Bayesian Optimization algorithm for efficiently optimizing the average payoff in cooperative multi-agent systems where the payoff function is unknown. It leverages the mean-field assumption, meaning agents are treated as identical and their individual actions don't matter, only the overall distribution of actions. This allows for scalability independent of the number of agents.
Key points for LLM-based multi-agent systems: MF-GP-UCB addresses the challenge of optimizing complex, unknown objective functions in multi-agent settings, which is directly relevant to training and coordinating LLMs in collaborative tasks. The mean-field approach simplifies the problem, making it computationally tractable for large numbers of agents (LLMs). The algorithm's regret bound, a measure of its performance, is independent of the number of agents, a significant improvement over traditional methods that struggle with scalability. This makes it particularly promising for systems involving many LLMs. The paper also explores using contexts (agent types) that could represent different LLM roles or specializations.