AI Weekly Issue #477: Jensen Huang says we've achieved AGI. The benchmarks say 0.37%.

Jensen Huang's claim that artificial intelligence has achieved AGI collapses under scrutiny. Frontier models score 0.37% on ARC-AGI-3, a benchmark that tests adaptation to novel, rule-free environments. Humans solve these tasks 100% of the time.

ARC-AGI-3 measures what matters: whether AI can handle genuinely new problems without training data. Current systems excel at pattern-matching within their training distributions but fail catastrophically outside it. This gap defines the real boundary between AI capabilities and hype. Superhuman performance on exams means little when models cannot figure out simple interactive games with no predefined rules.

The infrastructure shift reveals where real value concentrates. This week saw $25 billion in deals targeting systems, not models themselves. IBM acquired Confluent for $11 billion to control real-time data streaming. Eli Lilly paid $2.75 billion for Insilico's drug development pipelines. Physical Intelligence raised $1 billion for robot control systems. Building better language models has become table stakes. Ownership of the data flow between models and the physical world now generates defensible competitive advantage.

This inversion matters for enterprise strategy. Companies cannot compete by chasing the next frontier model release. NVIDIA, Meta, and OpenAI will handle foundation models. Defensible moats exist in specialized data pipelines, real-time inference infrastructure, and domain-specific integration layers that connect AI to actual business processes.

The ARC-AGI-3 results expose what Huang's AGI claim ignores. The industry conflates benchmark dominance with general intelligence. Current architectures remain brittle. They perform well on memorization tasks and can interpolate within known patterns. They cannot extrapolate to truly novel situations. Until models demonstrate consistent adaptation to novel environments, AGI remains marketing language, not technical reality.

AI Weekly Issue #477: Jensen Huang says we've achieved AGI. The benchmarks say 0.37%.

New benchmark shows Claude Mythos and GPT-5.5 can develop real browser exploits autonomously

New benchmark confirms AI video generators look stunning but still can't reason about the world

AI Weekly Issue #484: Your AI chats can be used against you in court

Get Daily AIWireDaily