New model releases, benchmarks, and comparisons — from frontier labs and open-source projects.

Models

Satya Nadella warns that AI could hollow out entire industries, echoing the damage done by globalization

Microsoft CEO Satya Nadella warns that AI threatens to concentrate economic value among a handful of frontier models, potentially hollowing out entire…

14h ago
Models

When deep research isn't enough for your business: Sakana AI launches 'ultra deep research' agent for 100+ page reports in 8 hours

Tokyo-based Sakana AI launched Marlin, its first commercial product, positioning it as a "Virtual CSO" for enterprise research. The autonomous agent a…

14h ago
Models

85% of IT teams claim every AI agent is under control. Only 42% actually know who owns them.

IT teams are masking a severe governance crisis behind confident rhetoric. Eighty-five percent claim ownership exists for every AI agent they deploy. …

14h ago
Models

The PM’s Playbook for Shipping AI Features That Actually Work in Production

# The PM's Playbook for Shipping AI Features That Actually Work in Production Product managers chasing AI features face a brutal reality: what works …

14h ago
Models

Inside the fight over Claude Mythos 5

Anthropic faced a weekend confrontation with the Trump administration over its latest AI model releases. The company received a US export control dire…

14h ago
Models

Vibe coding can build your pipeline. It can't explain it six months later

AI coding agents are accelerating data engineering by generating pipelines, orchestrations, and infrastructure from natural language prompts. This spe…

Yesterday
Models

AI Weekly Issue #503: Washington just repriced frontier AI

The U.S. government moved to restrict access to Anthropic's latest frontier models within days of their release, triggering a broader reassessment of …

Yesterday
Models

MCP solved tool calling. A2A solved coordination. What solves transport?

Distributed computing keeps cycling through the same pattern: competing standards emerge, then one wins through simplicity. REST beat CORBA, DCOM, and…

Yesterday
Models

Pokémon Go players unwittingly contributed to tech with military drone uses

Pokémon Go players have unknowingly supplied training data for military drone technology. The game's massive user base generated billions of geotagged…

Yesterday
Models

The FBI built a small town to simulate cyberattacks

The FBI has constructed a full-scale fake town in Huntsville, Alabama to train agents on defending against cyberattacks. The Cyber Range spans 22,000 …

Yesterday
Models

Google Cloud's Open Knowledge Format turns scattered docs into Markdown files for AI agents

Google Cloud released Open Knowledge Format (OKF), a standardized specification for converting scattered organizational documents into structured Mark…

2 days ago
Models

AI coding agents find the right file but miss the exact lines that matter, study shows

AI coding agents excel at locating the correct file when fixing bugs but fail to identify the specific lines that need modification. A new benchmark c…

2 days ago
Models

The Subsidy Ended: What Tool-Using Agents Actually Cost

GitHub Copilot shifted to metered billing on June 1, ending unlimited usage for paid subscribers and introducing an explicit cost structure that revea…

2 days ago
Models

Cheaper, faster, and culturally aware, Avataar’s video AI is built for India’s scale

Avataar AI launched a distilled video generation model priced at $0.005 per second, targeting India's massive market at a fraction of typical AI video…

2 days ago
Models

Google sues Chinese cybercrime network that used Gemini to automate scams

Google filed suit against a Chinese cybercrime network that weaponized its Gemini AI model to automate large-scale scam operations. The defendants all…

2 days ago
Models

Open model Kimi K2.7 Code undercuts GPT-5.5 and Claude by up to 12x on price per token

Moonshot AI released Kimi K2.7 Code, an open-weights model with one trillion parameters designed for programming tasks. The model trails GPT-5.5 and C…

2 days ago
Models

US government forces Anthropic to disable Claude Fable 5 and Mythos 5 for all customers worldwide

The US government has ordered Anthropic to disable Claude Fable 5 and Mythos 5 globally, citing jailbreak vulnerabilities. Anthropic is complying but …

2 days ago
Models

AI Weekly Issue #495: Musk, Zuckerberg killed Trump's AI safety order in three phone calls

Elon Musk, Mark Zuckerberg, and David Sacks blocked Trump's proposed AI safety executive order through three phone calls on Wednesday night, derailing…

2 days ago
Models

AI Weekly Issue #484: Your AI chats can be used against you in court

# AI Weekly Issue #484: Your AI Chats Can Be Used Against You in Court Conversations with AI chatbots now carry legal consequences. Courts are treati…

2 days ago
Models

Amazon CEO reportedly raised Anthropic model concerns before government crackdown

Amazon CEO Andy Jassy raised security and safety concerns about Anthropic's AI models before the company restricted global access to Claude 3.5 Sonnet…

2 days ago
Models

Anthropic blocks all public access to Claude Fable 5, Mythos 5 following US government order — what enterprises should do

The US government issued an export control directive ordering Anthropic to suspend all access to Claude Fable 5 and Claude Mythos 5 for foreign nation…

3 days ago
Models

New AI model called "Count Anything" does exactly what it says, and that's harder than it sounds

Researchers have unveiled "Count Anything," an AI model that counts objects in images across vastly different contexts using only a text prompt. The s…

3 days ago
Models

Google Research's Gemini-SQL2 tops text-to-SQL benchmarks by a wide margin

Google Research unveiled Gemini-SQL2, a specialized system that converts natural language queries into executable SQL commands. Built on the Gemini 3.…

3 days ago
Models

Microsoft's SkillOpt boosts GPT-5.5 by using nothing but a trained Markdown file

Microsoft researchers partnered with three Chinese universities to develop SkillOpt, a novel optimization technique that improves AI agent performance…

3 days ago
Models

Claude Fable 5 outpaces GPT-5.5 by 13 points on FrontierMath's toughest problems

Anthropic's Claude Fable 5 has achieved 88 percent accuracy on FrontierMath's hardest problem tier, substantially outperforming OpenAI's GPT-5.5, whic…

3 days ago
Models

Kimi K2.7-Code cuts thinking tokens 30% — but practitioners say the benchmarks don't check out

Moonshot AI unveiled Kimi K2.7-Code this week, claiming the open-source coding model reduces inference "thinking tokens" by 30% while delivering doubl…

3 days ago
Models

Google researchers introduce 'faithful uncertainty,' allowing LLMs to offer best guesses instead of hallucinations

Google researchers have identified a path forward in the stubborn hallucination problem plaguing large language models. Their solution centers on "fai…

3 days ago
Models

AI Weekly Issue #490: Anthropic just had AI's biggest week of 2026

Anthropic transformed its market position in five days, executing moves that reshape AI infrastructure and enterprise adoption. The company's Q1 reven…

3 days ago
Models

Radar Trends to Watch: June 2026

Autonomous agents are moving beyond isolated task execution into full operational management. Cloudflare and Stripe have deployed an agent capable of …

3 days ago
Models

Anthropic’s safety warnings may have just backfired — the government has pulled the plug on its most powerful AI

Anthropic has publicly disputed a government decision to halt deployment of one of its most powerful AI models, citing safety concerns over a discover…

3 days ago
Models

NanoClaw and JFrog launch 'immune system' to block AI agents from downloading malicious code

NanoClaw and JFrog have launched a security integration designed to prevent AI agents from downloading malicious code. The partnership hardwires NanoC…

4 days ago
Models

PixelRAG beats text parsers on accuracy and cuts AI agent token costs 10x

A new retrieval-augmented generation system bypasses text parsing entirely, improving accuracy while slashing token costs for AI agents by tenfold. P…

4 days ago
Models

The AI industry's platform trap is starting to look a lot like Microsoft's

Anthropic faces mounting pressure from customers, partners, and investors over deliberate restrictions on its new Mythos model paired with the company…

4 days ago
Models

AI Weekly Issue #485: When AI teaches AI, it teaches in secret

AI model training has entered a phase where artificial intelligence systems teach other AI systems in ways humans cannot directly observe or understan…

4 days ago
Models

I Let an AI Agent Run 40 Experiments While I Slept

An AI agent running unsupervised machine learning experiments delivered measurable improvements overnight but also wasted time on avoidable problems, …

4 days ago
Models

Xiaomi's new open source, agentic AI coding harness MiMo Code beats Claude Code at ultra-long, 200+ step tasks

Xiaomi has open-sourced MiMo Code V0.1.0, a terminal-native AI coding assistant designed to handle complex, multi-step development tasks. The tool tar…

4 days ago
Models

Context compression finally works in production: new research cuts LLM input 16x without the accuracy hit

Context windows are becoming a computational bottleneck for large language models, especially as agents accumulate tokens from retrieved documents, re…

4 days ago
Models

Anthropic study shows AI needs hours, not weeks, to build exploits from security patches

Anthropic's security research demonstrates a critical acceleration in exploit development. The company's Mythos Preview model converted security patch…

4 days ago
Models

AI Weekly Issue #502: Your AI can now spend your money — Visa wired it into ChatGPT

Visa has integrated payment processing directly into ChatGPT, allowing AI agents to make purchases at any Visa-accepting merchant without user interve…

4 days ago
Models

Generative AI in the Real World: Agentic Systems Fundamentals with Maarten Grootendorst

Maarten Grootendorst, creator of BERTopic and developer relations engineer at Google DeepMind, argues that foundational AI techniques remain essential…

4 days ago
Models

Microsoft’s open-source SkillOpt automatically upgrades AI agent skills without touching model weights

Microsoft released SkillOpt, an open-source tool that automatically optimizes AI agent skills without modifying model weights. The system addresses a …

5 days ago
Models

What AI benchmarks miss about real-world performance

Enterprise AI teams optimize for the wrong metrics. They chase benchmark scores for compute and training throughput while ignoring what happens when m…

5 days ago
Models

Google's DiffusionGemma generates 256 tokens in parallel and self-corrects as it goes

Google released DiffusionGemma this week, an open-source experimental model that generates text using diffusion, the same iterative refinement approac…

5 days ago
Models

AI Weekly Issue #498: Anthropic files for an IPO. NVIDIA ships its stack.

Anthropic filed confidentially for a public offering today, marking a significant step toward going public for one of AI's largest independent laborat…

5 days ago
Models

AI Weekly Issue #496: Anthropic's Pentagon model is now everyone's model

Anthropic's release of Mythos marks a watershed moment in AI access. The model, previously restricted to cleared Pentagon contractors, is now availabl…

5 days ago
Models

Surprise upset: GPT-5.5 beats Claude Fable 5 on brutal new Agents’ Last Exam benchmark

UC Berkeley researchers have launched Agents' Last Exam (ALE), a demanding benchmark designed to evaluate whether AI models can execute complex, long-…

5 days ago
Models

Researchers say they trained a foundation model from scratch for about $1,500

Sapient researchers claim they trained a foundation language model from scratch for approximately $1,500, challenging the conventional wisdom that bui…

5 days ago
Models

Anthropic CEO calls for FAA-style regulation of powerful AI models: what enterprises should know

Dario Amodei, CEO of Anthropic, is pushing for regulatory oversight of advanced AI models modeled after the Federal Aviation Administration's approach…

5 days ago
Models

MassMutual's AI strategy: 12-month contracts, 30% productivity gains, zero lock-in

MassMutual built an AI strategy designed to avoid vendor lock-in and adapt as the technology landscape shifts. Instead of committing to multi-year con…

5 days ago
Models

Google's new open model DiffusionGemma generates text from noise instead of word by word

Google released DiffusionGemma, a 26-billion-parameter open model that abandons the traditional token-by-token text generation approach. Instead, the …

5 days ago

Get Daily AIWireDaily

The best stories, delivered to your inbox each morning.