How RecursiveMAS speeds up multi-agent inference by 2.4x and reduces token usage by 75%

Researchers at University of Illinois Urbana-Champaign and Stanford University developed RecursiveMAS, a framework that fundamentally changes how multi-agent AI systems communicate. Instead of exchanging text sequences, agents now share information through embedding space, the dense vector representations that neural networks use internally.

The efficiency gains are substantial. RecursiveMAS accelerates multi-agent inference by 2.4x and cuts token usage by 75 percent. These improvements directly reduce computational costs and latency, two major pain points in deployed multi-agent systems. Text-based agent communication forces systems to generate full sequences, serialize them, deserialize them on receipt, and then process them again, creating unnecessary overhead at every step.

By skipping the text layer entirely, RecursiveMAS allows agents to pass compressed, structured information directly through embedding space. This approach preserves the semantic meaning agents need to collaborate while eliminating the inefficiency of natural language serialization. The framework also enables training the entire multi-agent system as a unified end-to-end unit rather than treating agents as independent modules.

Experiments demonstrate accuracy improvements across three demanding domains: code generation, medical reasoning, and search tasks. The combination of speed gains and performance improvements suggests the framework captures something real about how agents should coordinate.

The work addresses a practical constraint in scaling multi-agent systems. As organizations deploy more complex AI pipelines with multiple specialized agents, communication overhead becomes a bottleneck. Each agent call consumes tokens, introduces latency, and complicates training. RecursiveMAS removes these constraints without requiring architectural changes to underlying models.

The implications extend beyond efficiency metrics. Direct embedding-space communication might enable new forms of agent coordination that text-based systems cannot express effectively. Agents optimizing for human-readable text face different constraints than agents optimizing for compact vector representations. This shift could unlock capabilities in multi-agent reasoning that

How RecursiveMAS speeds up multi-agent inference by 2.4x and reduces token usage by 75%

Intercom, now called Fin, launches an AI agent whose only job is managing another AI agent

AI Weekly Issue #485: When AI teaches AI, it teaches in secret

ArXiv will ban researchers who upload papers full of AI slop

Get Daily AIWireDaily