How Sakana trained a 7B model to orchestrate GPT-5, Claude Sonnet 4 and Gemini 2.5 Pro

Sakana AI has developed a 7-billion-parameter orchestration model that dynamically coordinates larger frontier models like GPT-5, Claude Sonnet 4, and Gemini 2.5 Pro without human intervention. The system, called RL Conductor, uses reinforcement learning to analyze incoming queries and automatically route them to the best-suited worker LLMs in real time.

The core problem Sakana targets is brittleness. Handcoded LangChain pipelines fail the moment input patterns shift. Static routing rules designed for one query distribution collapse under different workloads. RL Conductor eliminates this fragility by learning dynamic routing policies that adapt as data shifts.

The model works as an intelligent traffic controller. It ingests a user query, assesses its characteristics, and decides which worker models handle which subtasks. It then coordinates responses across multiple agents. This automated orchestration outperforms both individual frontier models tested in isolation and manually designed multi-agent systems on reasoning and coding benchmarks.

The efficiency gains matter. A 7B conductor managing a consortium of larger models beats expensive single-model calls and hand-engineered workflows. This flips the scaling calculus. Instead of throwing more compute at a single model, you distribute workload intelligently across existing capacity.

The reinforcement learning training is the technical innovation here. Rather than rule-based routing, Sakana's team trained Conductor to optimize task allocation through trial and error, rewarding it for accuracy and efficiency. This teaches the model contextual judgment impossible to hard-code.

The implications extend beyond benchmark wins. If smaller models can effectively orchestrate frontier LLMs, teams avoid the latency and cost penalties of querying GPT-5 or Claude Sonnet 4 for every task. Conductor filters requests, specializes work, and batches similar queries before routing them downstream. The

How Sakana trained a 7B model to orchestrate GPT-5, Claude Sonnet 4 and Gemini 2.5 Pro

Meet ZAYA1-8B, a super efficient, open reasoning model trained on AMD Instinct MI300 GPUs

Anthropic Skill scanners passed every check. The malicious code rode in on a test file.

Why AI breaks without context — and how to fix it

Get Daily AIWireDaily