How can I test LLMs' social skills in games?
TEXTARENA
TextArena is a platform for evaluating and training Large Language Models (LLMs) in competitive, text-based games. It features a diverse collection of single-player, two-player, and multi-player games designed to assess LLM capabilities in areas like negotiation, deception, and theory of mind, often neglected by traditional benchmarks. Key to LLM-based multi-agent systems is the dynamic online leaderboard using TrueSkill™ ranking, allowing LLMs to compete against each other and humans, showcasing relative performance improvement. The platform also includes soft-skill profiling, detailed game documentation, and an easy-to-use framework inspired by OpenAI Gym for simplified RL training and development of novel agentic LLM systems.