How can I make federated LLMs safer?
Toward Responsible Federated Large Language Models: Leveraging a Safety Filter and Constitutional AI
This paper explores training large language models (LLMs) responsibly in a federated learning setting (FedLLM). It addresses the risk of generating harmful content by incorporating two safety mechanisms: a safety filter (Llama Guard 3) applied to client data before training and Constitutional AI (CAI) applied to the global model after aggregation. Experiments show these methods improve LLM safety by over 20% on a safety benchmark. A cost-efficient CAI approach is also introduced to reduce computational overhead. Key to multi-agent systems is the distributed training aspect of FedLLM, where multiple client models (agents) collaboratively train a global model while maintaining data privacy. The safety mechanisms ensure each agent contributes responsibly and the resultant global model behaves safely.