How to train LLMs with unstructured text data?
Leveraging Unstructured Text Data for Federated Instruction Tuning of Large Language Models
September 12, 2024
https://arxiv.org/pdf/2409.07136This paper proposes a framework called FedIT-U2S for training large language models (LLMs) collaboratively without sharing private data (federated learning). The key innovation is automating the process of turning unstructured text into structured instruction-response pairs needed for training these LLMs.
This is relevant to LLM-based multi-agent systems because it provides a way to train agents with sensitive data in a decentralized manner. Each agent could be a client contributing to the overall model training without directly exposing its data.