Convergence Dynamics of Agent-to-Agent Interactions with Misaligned Objectives
This paper develops a theoretical framework to study agent-to-agent interactions in a simplified in-context linear regression setting, where each agent is a single-layer transformer with linear self-attention. The authors analyze the coupled dynamics when two such agents update from each other's outputs under potentially misaligned fixed objectives.
Why it matters
This work provides a mechanistic framework to understand how prompt geometry and objective misalignment impact the stability, bias, and robustness of multi-agent LLM systems.
Key Points
- 1Theoretical framework for agent-to-agent interactions in a linear regression setting
- 2Agents modeled as single-layer transformers with linear self-attention
- 3Analyze dynamics when agents update under misaligned fixed objectives
- 4Misalignment leads to a biased equilibrium where neither agent reaches its target
- 5Contrast with adaptive multi-agent setting where a helper agent updates the objective
Details
The paper develops a theoretical framework to study agent-to-agent interactions in a simplified in-context linear regression setting. Each agent is modeled as a single-layer transformer with linear self-attention, trained to implement gradient-descent-like updates on a quadratic regression objective from in-context examples. The authors then analyze the coupled dynamics when two such agents alternately update from each other's outputs under potentially misaligned fixed objectives. They find that misalignment leads to a biased equilibrium where neither agent reaches its target, with residual errors predictable from the objective gap and the prompt-induced geometry. The paper contrasts this fixed objective regime with an adaptive multi-agent setting, where a helper agent updates a turn-based objective to implement a Newton-like step for the main agent, eliminating the plateau and accelerating its convergence. Experiments with trained LSA agents and GPT-5-mini runs on in-context linear regression tasks are consistent with the theoretical predictions.
No comments yet
Be the first to comment