Can MADRL agents defend against cyberattacks?
Multi-Agent Actor-Critics in Autonomous Cyber Defense
October 15, 2024
https://arxiv.org/pdf/2410.09134This paper investigates the application of multi-agent reinforcement learning (specifically Actor-Critic algorithms) in autonomous cyber defense. The researchers train agents that learn to collaborate and improve their performance in defending a simulated network (CybORG environment) from attacks.
Key points relevant to LLM-based multi-agent systems:
- Centralized Training with Decentralized Execution (CTDE): Agents learn from a shared, centralized critic but act based on their local observations, which is relevant for LLMs that might need to collaborate based on different information.
- Addressing Non-Stationarity: On-policy algorithms (A2C and PPO) are used to tackle the non-stationarity inherent in multi-agent learning, an important consideration when developing interacting LLMs.
- Discrete Action Spaces: This research focuses on discrete action spaces, which aligns well with the discrete nature of text-based actions typically used by LLMs.
- Parameter Sharing & Action Masking: While not the primary focus, the use of parameter sharing and action masking are relevant techniques for managing and improving the training efficiency of multi-agent LLM systems.