Can AI agents perpetuate stereotypes?
Social coordination perpetuates stereotypic expectations and behaviors across generations in deep multi-agent reinforcement learning
October 3, 2024
https://arxiv.org/pdf/2410.01763This research investigates how AI agents, trained using deep reinforcement learning, can develop and perpetuate stereotypes even in the absence of inherent biases. The study demonstrates that these agents, when tasked with coordinating in an environment with statistically correlated skills and group labels, learn to rely on stereotypes for efficient interaction, disadvantaging minority groups. This behavior persists across agent generations even when group differences are removed.
Key points for LLM-based multi-agent systems:
- Stereotype emergence: LLMs, like the agents in the study, could learn and reinforce stereotypes from statistical correlations in training data, potentially leading to biased outcomes even when not explicitly programmed to be biased.
- Generational transmission: Stereotypes learned by one generation of LLMs could be unintentionally passed down to subsequent generations through the training process, making them difficult to eradicate.
- Impact of coordination: The study highlights how the pressure for efficient coordination among LLMs in multi-agent systems might inadvertently promote reliance on stereotypes, especially in large-scale deployments where individual identification is difficult.