Alibaba researchers unveiled SkillWeaver, a framework that slashes agent token consumption by 99 percent by eliminating the need to load every available tool upfront. The system addresses a core problem in enterprise AI: agents managing hundreds of tools often struggle to identify which one to use for each workflow step.
SkillWeaver works by creating an execution graph for a given task, then selecting only the relevant skills needed at each node. The framework introduces Skill-Aware Decomposition (SAD), a feedback mechanism that allows agents to iteratively fetch and evaluate candidate tools rather than maintaining the entire tool set in context.
The distinction matters operationally. Traditional tool-routing approaches load all available skills into the agent's context window, inflating token counts and slowing inference. SkillWeaver's compositional design loads tools on demand, dramatically reducing computational overhead while improving routing accuracy.
The 99 percent token reduction directly impacts production costs. Enterprises running complex multi-step workflows across hundreds of tools burn through tokens quickly. Fewer tokens per task means lower API bills, faster response times, and the ability to handle longer workflows within standard context windows.
Real-world enterprise AI rarely involves single-tool operations. Modern systems orchestrate across databases, APIs, internal services, and specialized models. An agent handling customer service might need CRM integration, knowledge base search, billing tools, and ticket creation capabilities. Loading all four tools plus dozens of others into every inference step wastes tokens and introduces noise into tool selection decisions.
The feedback loop in SAD appears critical. Rather than making a single tool choice per step, the agent can evaluate multiple candidates and refine its selection. This iterative approach improves decision quality without requiring full tool enumeration.
Alibaba's framework addresses real pain points in production AI systems. As enterprises scale agent deployments, token efficiency becomes non-negotiable. The 99 percent reduction
