Multi-agent papers

January 2025Can LLMs build game trees from text?How can we build trustworthy AI agent economies?Can AI optimize SDN load balancing?How can I improve federated learning generalization without sharing data?How can XAI simplify MADRL for V2X resource allocation?Can graph attention Q-learning improve ride-pooling?How can I improve multi-agent trajectory prediction at intersections?How can MARL optimize wind farm power output?Can shared memory improve multi-agent pathfinding?How can I build safe, scalable multi-agent RL apps?How can customer-led task allocation optimize satellite services?How do LLM reward functions' language impact fairness and performance?How can I pick the best LLM agent for a task?How can hierarchical RL improve multi-UAV combat coordination?How can LLMs learn strategies in multi-leader Stackelberg games?Can offline MARL improve RRM efficiency?Can LLMs automate 3D film production?How can LLMs learn division of labor for collective intelligence?How can I improve multi-agent pathfinding efficiency?How can agent termination improve MARL convergence?Can quantum computing speed equitable disaster recovery?Can experience replay stabilize MARL beyond replicator dynamics?How can LLMs edit PDF charts via natural language?How transferable are adversarial attacks on shared backbones?Can zero-determinant strategies control payoffs in continuous games?How can AI agents participate in digital markets?How can UAVs share data for faster multi-task federated learning?Can grouped training improve large-scale MARL?How can graph coloring speed up multi-agent planning?How can RL optimize multi-agent drone tracking?Can global awareness improve MARL's sample efficiency?Can LLMs automate Hong Kong legal translation?Can multi-agent AI optimize complex processes better?How can an AR agent proactively help users with tasks?How can I build adaptive, two-layer agent models?How can I make distributed agents truthfully cooperate?How can LLMs control robots in the real world?How can game symmetries speed up Nash equilibrium computation?How can agents cooperate with limited info?How can voting rules ensure fair candidate rankings?How can AI optimize robot task allocation?How do spreader types impact information cascades in networks?Can shared scheduling prevent UAM collisions?Can LLMs improve self-adapting holonic systems?How can MARL optimize railway pricing?How can LLMs dynamically adjust multi-agent workflows?Can single-LLM prompts mimic multi-agent systems?How can I route queries efficiently across LLMs for accurate answers?Can CCBS reliably solve continuous-time MAPF?How to fairly allocate resources with Latin Square constraints?How can I optimize LLM agent teams using hierarchical RL?Can multi-agent HDRL improve portfolio optimization?How can agents trade IP using blockchain?How can I test my LLM cloud agents?Can small LLMs in a multi-agent system handle complex bioinformatics tasks?How can I prevent undesirable AI agent behavior?How can VLMs improve AMoD dispatching and motion planning?How can we make self-driving cars socially acceptable?Can LLMs fully understand geologic maps?How to efficiently allocate scarce resources in a multi-agent system?How can hypernetworks improve multi-agent coordination?How can I scale safe multi-agent control using GNNs for STL?How can multi-agent simulation improve city risk mitigation?How can I make AI agents collaborate despite communication delays?Can agents predict urban crime patterns?How can robots predict worker actions using decentralized graph networks?Can asymmetric agents ever share knowledge?Can agents improve schema matching?How can multi-agent LLMs improve educational AI inclusivity?Can LLMs reliably build enterprise models using knowledge graphs?How can I build adaptable, cooperative AI agents?How can agents reach partial agreements reliably?Can I verify my multi-agent RL system?How can I improve multi-agent pathfinding with limited communication?How can I improve MARL agent communication efficiency?How can agents communicate effectively despite varying visibility?How can I fix pose errors in V2X collaborative 3D object detection?How can RL optimize on-demand mobility?How can global games optimize multi-robot task allocation?How can I infer agent goals from observations using deep RL?How can I speed up multi-robot path planning?Can LLMs improve CEP for video queries?How can KG embeddings improve support ticket routing?Can robots reliably aggregate without computation?Can deep RL efficiently solve large-scale MFCGs?How to build standard LLM agent systems?How can LLMs best teach interactional intelligence?Can Q-learning agents reliably cooperate?How can agents best share and use information efficiently?Can I use symmetries to improve MARL scalability?How can I incentivize agents to explore better together?Can multi-agent LLMs improve engineering project solutions?How do network constraints impact market equilibrium in multi-agent systems?How can I control agent spatial behavior in LLMs?How can LLMs build better educational multi-agent systems?How can agents share surprise for better adaptation?How to optimize agent strategy updates in population games?December 2024How can decentralized agents efficiently navigate a continuous space to reach goals?How can LLMs improve robot teamwork?Can shared memory improve AI team foraging?How can I safely explore team constraints in multi-agent RL?Can multi-agent Q-learning optimize mobile network load balancing?How can game theory improve MARL for large-scale apps?Does minority homophily hinder network opportunities?How can MARL handle agent constraints and coordination?How can HRL improve large-scale robot task planning?Can LLMs improve legal AI decision-making?Can LLMs learn interpretable human behavior models?How can I make my LLM agents safer and more explainable?How can agents best coordinate data collection in dynamic environments?Can federated actor-critic reliably learn across diverse environments?How can LLMs learn medical norms in distributed healthcare?Can LLMs power decentralized GameFi agents?How can I asynchronously train human-AI teams in complex games?How does agent diversity boost collective AI learning?Can Bayesian RL improve multi-intersection traffic signal control?How can LLMs power multi-agent systems?How can agents, Sims, and Assistants work together?How can LLMs optimize multi-agent AI systems?How can hierarchical agents optimize UAV cluster reconfiguration?Can LLMs self-design better reasoning workflows?Can multi-agent RL optimize dynamic task assignments?Can spatial reasoning improve MARL efficiency?Can convolutional learning speed up traffic signal AI?How can agents explore cooperatively and efficiently?How can agents learn to cooperate better with limited information?How can I assess agent importance in my MAS?How can we reliably detect dangerous AI capabilities?How can diverse prompts improve small LLM reasoning?How can I better assign rewards in multi-agent RL?August 2024Can sequential planning efficiently solve multi-agent problems?December 2024How can I build robust multi-agent game equilibria?How can LLMs build collaborative data agents?Can AI dominate online belief systems?How can I build fair, norm-learning AI agents?How can LLMs solve complex data analysis tasks?How can agents build AI model pipelines?How can I improve robot swarm localization accuracy in sparse, noisy environments?How to plan robot paths with limited communication?Can I efficiently calibrate traffic models using road speed data?Can decentralized agents reach equilibrium prices through bilateral negotiation?How can AI optimize healthcare during war and pandemic?Can ROMAS improve LLM-based database monitoring?How can I improve robot pathfinding in complex environments?Can I speed up distributed QP solving with deep learning?Can suggestion sharing improve MARL collective welfare?Can LLMs build P&ID diagrams from text?Can coupled agent homeostasis create prosocial AI?How can deep learning improve resilient multi-agent decisions?How do norms emerge in multi-agent systems?How can I scale asynchronous multi-agent pathfinding?Can coalitions manipulate knockout tournaments adaptively?How can shared action suggestions improve multi-agent planning efficiency?How can multi-agent RL optimize SAGIN task scheduling?Can I speed up multi-agent genetic programming?How can I optimize multi-robot graph coverage with constraints?Can LLMs learn cooperation in multi-agent systems?How can I simulate realistic cooperative perception in autonomous driving?How can I model complex agent beliefs about rationality?Can LLMs automate biomedical research?Can we efficiently compute conditional approval votes?Can Bach in Scala secure asynchronous communication?Can V2V networks improve autonomous vehicle safety in occluded scenarios?How can Web3 incentivize human-AI cooperation?How does LLM mirroring impact alignment?How can agents simulate speculative token trading?Can conventions improve Hanabi MARL performance?Can LLMs improve transport system modeling?Can AI agents automate industrial diagram design?Can I efficiently model check asynchronous agents with memory?Can AI predict warehouse tasks to improve robot efficiency?Can HyperGraphOS improve LLM agent apps?How can LLMs improve Minecraft multi-agent collaboration?Can debating LLMs detect breaking event rumors?How can LLMs enable safe, fast, multi-robot navigation?How can I improve LLM agent feedback?Can ToM predict cyberattack trajectories?Can hypernetworks improve multi-agent RL efficiency?How can I improve LLM agent pathfinding efficiency?How can LLMs extract ABM code from prompts?How can agents collaboratively track targets in a decentralized system?How can I predict opponent robot behavior without knowing their exact plans?How can I efficiently allocate tasks among LLMs?Can AI improve highway traffic flow using ACC?Can hyper-optimized agents hinder collective AI performance?How can we prevent AI agents from causing harm?How to identify interactions in a complex multi-agent system?How to mitigate malicious agents' impact on opinion evolution cost in MAS?Can LLMs improve construction project decisions?How can I speed up multi-robot path planning?How can I make my MARL agents fault-tolerant?How can I build robust AI agents using game theory?Can a neural network optimize satellite magnetorquer power?How can I predict agent behavior using short-sightedness?November 2024Can LLMs improve portfolio management?How can I build robust MARL agents with intermittent observations?Can MARL improve TSP in traffic signal control?How can we build ethical generative agents?How can local info improve robot swarm task allocation?How can I improve autonomous vehicle trajectory prediction accuracy and safety?How can agents best manage video editing tools?How does network density spread misinformation?How do governance systems affect agent behavior in simulated economies?How do social norms shape AI agent emotions?Can a private AI be safely switched off?How can I efficiently co-design robot morphology and behavior?Can AI automate software feature integration?How do embodied neural agent interactions affect group decisions?How can LangGraph+CrewAI improve LLM multi-agent apps?Can LLMs boost creativity in multi-agent systems?Do multi-agent LLMs improve robot interaction?How many vehicles optimize collaborative SLAMMOT?Can LLMs better simulate power systems with multi-agent feedback?How to control satellite formations using magnetic fields?Can LLMs model regulatory compliance?Can LLMs improve social network simulations?How can we reliably evaluate LLMs without ground truth?Can LLMs build multi-agent game world models without training?Do naive bandit learners collude?Can φ⁴ lattice fields model financial markets?How can we fairly serve diverse LLMs?How can LLMs improve multi-agent consensus?How can I optimize agent guidance for dynamic MAPF?Can agents optimize CO2 transport?How can AI optimize mobile charger routes for long-lasting sensor networks?Can multi-agent DRL safely merge vehicles onto highways?Can AI agents improve clinical trial matching?How do human and GPT ethics differ in multi-robot systems?How can I train agents for game-theoretic motion planning?Can Hybrid Event-B verify autonomous system safety?Can model checking improve robot welding sync?How can LLMs generate diverse human-like agents for better cooperation?How to build robust controllers for LLM robot collectives?How can hybrid clouds handle complex AI workloads?How to architect scalable LLM apps?How can I build truly versatile AI agents?Can robots grow by consuming others?Can evolutionary games improve multi-agent pathfinding?Can evolving Q-learning agents cooperate?How can robot swarms learn better via communication?Can LLMs creatively deceive in Balderdash?How can agents communicate meaningfully in collaborative tasks?Can MARL model ESG investment's climate impact?How can I route LLM requests efficiently with continuous learning?How can smart agents improve multi-UAV search efficiency?How can multi-hop relays improve resilient consensus in leader-follower systems?How can I make robot collision avoidance less conservative?Can I find robust Nash equilibria efficiently in data-driven games?How to uniquely implement largest equilibrium in dynamic games?How to optimally position multiple spacecraft to explore interstellar objects?How can robot swarms efficiently self-localize for inspection?Can ToM improve AI collective intelligence?How can AI agents self-organize for complex goals?Can MPC optimize multi-agent weighted coverage path planning?How to select high-performing agents for decentralized systems?Can MARL optimize parallel machine scheduling?How to make LLMs use inclusive pronouns?Can cheaper LLMs automate ML tasks?How can LLMs strategize in changing games?How to find gas leaks better with robots?Can offline MARL handle diverse traffic control data?How can LLMs play games rationally?Can LLM agents protect against timing attacks?How to train agents in large populations with limited rationality?Can LLMs manage industrial control autonomously?How can TinyML optimize multi-agent inference for mining machinery?How to train an LLM for multi-task correction?Can LLMs learn to control multiple robots to push large objects?How to safely navigate many robots using dynamic velocity fields?Can vision predict multi-agent behavior?Can LLMs verify human-like behavior in games?Can AI agents build a real-time battlefield map?How can LLMs manage complex tasks with multiple agents?How can LLMs improve C-V2X platooning efficiency with semantic-aware resource management?How to optimize LLM agent cooperation?How to make consistent story videos with AI agents?How can LLMs navigate safely and efficiently in shared spaces?How can LLM agents learn to leverage social structures in adaptive environments?How to improve multi-agent exploration with consensus guidance?How to speed up LLM agent simulations?Can LLMs learn better dispatching rules from big data?How can multi-robots map and explore 3D spaces efficiently?How to speed up LLM communication?Can AI learn to play games *better* than Nash equilibrium?How can I learn hidden interactions in real-time multi-agent systems?How can LLMs learn to adapt to different roles in multi-agent games?How can agents communicate implicitly without explicit messages?How to measure agent responsibility in planning?How to optimize robot coverage with varying energy levels?How can LLMs represent traffic scenes for multi-vehicle collaboration?How can LLMs anticipate actions in multi-agent scenarios?Can LLMs automate post-disaster response?How to rank agents using noisy performance data?Can AI agents build civilizations in Minecraft?How to form platoons that benefit individual drivers?How can agents learn to communicate effectively in multi-agent systems?How to design better mortgage assistance products?How can LLMs manage smart factory robots?How to track evaders with multiple robots?October 2024How to model multiparty interactions in CCS with continuations?How to optimize communication for faster team consensus in multi-agent bandits?How can VAEs and RL optimize network structure for resource management?How to guide agents in a network with limited control?How does network connectivity affect convergence rates in multi-agent systems?How to best evaluate LLM-powered agents?How can MARL optimize drone mission execution with limited battery?How to make robots work together using LLMs?How can LLMs collaborate globally for complex tasks?How do LLM agents interact on large networks?How to design robot swarms for real-world use?Can Jax speed up multi-agent economic simulations?How can we make participatory budgeting fair and efficient?How can LLMs explain their reasoning to humans?How can LLMs learn to communicate effectively in a multi-agent game?Can RL agents fairly stream multimedia?Can AI agents learn to profit in noisy market simulations?How can LLMs power personalized e-commerce recommendations?How to optimize agents with limited bandwidth?Can LLMs build image processing apps?Can KGs improve LLM agent recommendations?Can FastICA separate sources without centralized whitening?How do neural networks evolve for complex agent behavior?Can Mamba-based agents outperform MAT with fewer resources?How does silence impact consensus in social networks?Can GNNs improve MARL for supply chain inventory control?Can LLMs collude in perishable goods markets?How can LLMs coordinate autonomous vehicles safely and efficiently?How can I build efficient MARL-based traffic signal control systems?How can LLMs predict future actions in multi-agent systems?How to build smart, adaptable cyber defenses with LLMs?Can offline data train AI agents for large-scale games?Can LLMs evolve to build entire software?How to plan collision-free paths for multiple agents?Can LLMs work together to analyze graphs?How can agents collaborate to optimize rewards while staying within a cost budget?Can one LLM model handle all sports trajectory tasks?How does opponent learning impact large-scale agent evolution?Can swarms learn like RL agents?How to scale multi-agent control for networks?How can I use simulation to improve multi-robot coordination?How can LLMs learn fair, diverse, and creative strategies in multi-agent games?Can APIs outperform web browsing for AI agents?How to build trust in e-commerce with LLMs?How to make AI agents cooperate with limited information?How to optimize pilot allocation and power for fair, delay-constrained access?How can LLMs truly benefit society?How to train MARL for dynamic agents?How do network connections affect LLM multi-agent safety?How to fuse sensor data for accurate target tracking?Can AI agents learn to be fair while being efficient?How to plan safe, efficient CAV trajectories using V2X?Can RL agents learn to eco-drive in real-world traffic?How hard is it to solve a colored sliding tile puzzle?Can LLMs simulate traffic with natural language?How can LLMs help agents cooperate better?How to optimize multi-agent MDPs with KL control cost?Can LLMs reliably coordinate under attacks?How to train realistic traffic agents for autonomous driving?How secure are AI agents with database access?How well do LLMs really solve problems?How can I train UAVs to navigate unseen environments?Can Natural GaLore speed up LLM training?Can spiking networks control robot swarms?How complex are multi-agent decisions?How do robot platoons navigate crowds?Can I verify human-like strategic reasoning in MAS?Can LLMs simulate fake news spread?Can three valuation types guarantee EFX?How to protect multi-agent apps from attacks?Can LLMs automate mobile tasks efficiently?How can LLMs improve team formation in adversarial games?How to design fair and strategic facility location mechanisms?Can LLMs design alloys faster with AI agents?How to ensure fair rewards in multi-agent systems?How can drones and AR see through walls?How to choose best LLMs for merging?How to explain AI agent action impact in multi-agent scenarios?How to train agents to form and move efficiently?Can influencers manipulate online polls?Do time-varying auctions break revenue equivalence?How many Nash equilibria exist in LQ games?How to avoid collisions in dense multi-agent paths?Can LLMs trade better with fact-subjectivity reasoning?How can LLMs help with automotive safety engineering?How to build efficient multi-agent systems for business?How can AMOD serve Winnipeg's aging population?How to design optimal communication for LLM agents?Can MCTS improve Uno AI with better rewards?Can LLMs orchestrate cross-domain workflows?Can transformers play games in-context?Can LLMs find optimal paths in multi-agent games?How many agents are needed to win a proximity-based vote?Can MADRL agents defend against cyberattacks?How can LLMs improve edge caching in vehicle networks?How to improve LLM knowledge base with feedback?How to build a context-aware AI assistant with multiple LLMs?How does crowd opinion boost AI performance?How can I estimate uncertainty in distributed AI learning on edge devices?How can language help LLMs learn numbers faster?How can LLMs explain multi-robot decisions?How to learn masks for diverse agents in MARL?Can LLMs automate privacy threat modeling?Can LLMs automate ptychography?How do LLMs form conventions and influence society?How to shorten multi-agent paths on graphs?How well do LLMs generate complex workflows?Can AI agents simulate realistic disease spread?How can LLMs learn and adapt without parameter updates?How to teach AI agents safe interaction?Can LLMs handle strategic agents with externalities?How can LLMs help moderate hate speech ethically?Can LLMs break social rules in hierarchy?How to price EV charging with reservations?How can robots form shapes without GPS?Can multi-agent RL fine-tune LLMs better than PPO?Can LLMs diagnose and treat mental health?Can LLMs improve MARL without constant calls?How does social support coordinate agents in online communities?How to detect malicious agents in a multi-robot network?How can I make LLMs better tutors?Can LLMs repair code on SWE-Bench?How do LLMs form factions?How can LLMs debate to evaluate each other?How does group pressure drive consensus in opinion dynamics?How to scale LLM multi-agent control in the cloud?Can LLMs simulate large social systems reliably?Can LLMs automate full AutoML pipelines?How to plan robot paths with diffusion models?How to train LLMs for distributed multi-task learning?How can LLMs learn to solve multi-agent problems?How to train cooperative agents offline with shifting data?How can specialized agents collaborate to write better stories?How to reduce LLM multi-agent communication costs?How to plan paths for swarms of robots?How to plan tasks for LLM agents?Can LLMs learn emergent patterns in multi-agent RL?How to scale MARL for many agents efficiently?Can AI agents perpetuate stereotypes?How to plan for agents with fast replanning?How can human interaction speed up LLM agent planning?How to estimate state with limited communication in dynamic agent networks?How can we test AI safety with resource sharing?How to simulate realistic human mobility for large-scale web apps?September 2024How to locate facilities with uncertain agents?Can MARL optimize material handling throughput?How to safely control robots using uncertain predictions?How can LLMs learn interpretable world models for open-ended agents?Can LLMs simulate diverse viewpoints for better decisions?How can an LLM power a proactive multi-agent office assistant?How can agents learn to cooperate in a one-shot game?How can LLMs explain their decisions in multi-agent systems?How can LLM agents safely navigate with limited sensing?How can LLMs control industrial automation?How to make robots explain their decisions?How can robots collaborate with humans in complex tasks?How can LLMs help agents communicate in ad-hoc teams?How to optimize UAVs for MEC task delay?How do modular autonomous vehicles impact traffic flow?Can offline RL manage radio resources better?Can LLMs make crowds more realistic?Can I find stable matchings in complex networks?How to value information in delayed action planning?Can AI analyze gait to detect muscle disorders?How to fairly fund projects with limited budget?How can MCTS improve CAV coordination?How to control bias in multi-agent systems for task allocation?How to distribute charging loads for EVs using IoT and multi-agents?Can AI assistants prevent pilot spatial disorientation?How can I learn multi-agent utility functions robustly?Can LLMs mimic human collaboration?How can LLMs improve CAV decisions using transformers?Can simple imitation learning beat complex MAPF models?How can LLM agents work together to schedule factory production dynamically?How to plan paths for multiple agents to find information?How do AI agents cooperate under disruptions?How can LLMs think better with inner dialogue?How do AI traders affect market volatility?Can CAVs benefit humans in traffic?How to regulate multi-agent systems without knowing their network?How does diminishing stubbornness affect agent convergence?Can RVs optimize mixed traffic at unsignalized intersections?How can hypergraphs improve multi-vehicle motion prediction?How to stabilize multi-agent learning in non-stationary environments?How can LLMs coordinate robots for efficient task allocation in crowded spaces?How can humans help AI agents work together better?How to use data better in offline MARL?How to verify LLM-generated RTL code?How does robot connection length impact obstacle traversal?How to build modular LLM agents?Can AI agents reliably reproduce scientific research?Can AI shape viral evolution for better therapies?Can DaSH learn multi-robot strategies?How can LLMs understand noisy instructions?Can LLMs scale ABMs to millions of agents?How can PPO train UAVs to explore?How to plan paths for robots in a continuous space?How to make LLM agents fair using utilitarian optimization?How to safely coordinate AI agents for real-time control?How to make AI agents flock using zones?Can human operators safely supervise large AV fleets?How to optimize multi-agent submodular maximization with unreliable communication?How to safely control thousands of robots in a cluttered environment?Can nudges boost cooperation in multi-agent games?How to safely navigate agents using VOs and CBFs?How can I target ads in transit systems using AI agents?How to allocate tasks in unknown, dynamic environments?How does ToM impact real-time human-AI teamwork?How can LLMs build reliable AI swarms in untrusted environments?How to sync & map changing network connections?How does learning speed impact coordination in multi-agent systems?Can RL agents find paths in social networks without global knowledge?Can LLMs improve multi-agent perception efficiency?Can LLMs create social agents?How to generate diverse maps for multi-agent path finding?How to optimize communication for multi-agent RL?How to train LLMs with unstructured text data?How can STEADI principles guide responsible blockchain development?How to find a source in 3D with robots using Voronoi formations?How to build scalable, differentiable multi-agent foraging simulations?How can agents adapt to misinformation in games?How to train agents in decentralized games for approximate Nash equilibria?Can hypergraphs improve traffic signal control?How to train multi-vehicle navigation in unstructured environments faster?How to safely plan trajectories with fewer agents?Can LLMs solve multi-agent optimization problems faster?How to coordinate CAVs and HDVs in traffic?How can I simulate and compare decentralized robot task allocation algorithms?How to model dynamic, sparse correlations in multi-output GPs?How train agents centrally, act decentrally?How can incentives align agents for social good?How can LLMs learn emergent language?How to secure CAV perception data sharing?How to build smart LRV monitoring with multi-agents?How can LLMs collaborate to personalize multimodal AI search?How to train multi-agent AI with limited human feedback?How can GNNs optimize UAV AoI in unknown environments?How to optimize LLM agent networks for performance?How to group similar AI agents for faster simulations?How to build smart robot swarms for efficient pathfinding?Can LLMs model social media impact on financial markets?Can LLMs solve large-scale pathfinding problems?How to improve web agent planning?Can LLMs learn social norms through dialogue?August 2024How to analyze team composition balance in PvP games?How to localize multiple sources using TDOA measurements in 3D?How can cognitive models improve LLM-based prediction of user engagement? How can LLM agents with different roles collaborate for efficient planning? How to optimize non-smooth functions with linear constraints using block-coordinate methods? How to align LLMs with rules without human annotations? How can I model viral spread using multi-agent simulation in buildings? How to simulate realistic, safety-critical traffic for AV testing? Can LLMs infer network topology in multi-agent systems? How can LLMs help experts build better agent-based models? May 2024Can LLMs translate complex literature better than humans?

Made by @miklosme

Open source on GitHub

Can multi-agent RL fine-tune LLMs better than PPO?

Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning

October 10, 2024

https://arxiv.org/pdf/2410.06101

This research introduces CORY, a novel method for fine-tuning Large Language Models (LLMs) by framing the process as a sequential cooperative multi-agent reinforcement learning problem.

Instead of training a single LLM, CORY duplicates the LLM into a "pioneer" and "observer" that learn collaboratively. The key mechanisms are:

Knowledge Transfer: The observer learns from both the user query and the pioneer's response, allowing it to better align with the task reward and avoid straying too far from the original LLM (distribution collapse).
Role Exchange: The pioneer and observer periodically switch roles, preventing the observer from becoming overly reliant on the pioneer's output and ensuring both LLMs can operate independently.

Experiments demonstrate that compared to traditional RL fine-tuning (PPO), CORY achieves comparable or better performance with improved training stability, robustness to distribution collapse, and better balancing of task reward and KL divergence.