New model releases, benchmarks, and comparisons — from frontier labs and open-source projects.
Satya Nadella warns that AI could hollow out entire industries, echoing the damage done by globalization
Microsoft CEO Satya Nadella warns that AI threatens to concentrate economic value among a handful of frontier models, potentially hollowing out entire…
When deep research isn't enough for your business: Sakana AI launches 'ultra deep research' agent for 100+ page reports in 8 hours
Tokyo-based Sakana AI launched Marlin, its first commercial product, positioning it as a "Virtual CSO" for enterprise research. The autonomous agent a…
85% of IT teams claim every AI agent is under control. Only 42% actually know who owns them.
IT teams are masking a severe governance crisis behind confident rhetoric. Eighty-five percent claim ownership exists for every AI agent they deploy. …
The PM’s Playbook for Shipping AI Features That Actually Work in Production
# The PM's Playbook for Shipping AI Features That Actually Work in Production Product managers chasing AI features face a brutal reality: what works …
Inside the fight over Claude Mythos 5
Anthropic faced a weekend confrontation with the Trump administration over its latest AI model releases. The company received a US export control dire…
Vibe coding can build your pipeline. It can't explain it six months later
AI coding agents are accelerating data engineering by generating pipelines, orchestrations, and infrastructure from natural language prompts. This spe…
AI Weekly Issue #503: Washington just repriced frontier AI
The U.S. government moved to restrict access to Anthropic's latest frontier models within days of their release, triggering a broader reassessment of …
MCP solved tool calling. A2A solved coordination. What solves transport?
Distributed computing keeps cycling through the same pattern: competing standards emerge, then one wins through simplicity. REST beat CORBA, DCOM, and…
Pokémon Go players unwittingly contributed to tech with military drone uses
Pokémon Go players have unknowingly supplied training data for military drone technology. The game's massive user base generated billions of geotagged…
The FBI built a small town to simulate cyberattacks
The FBI has constructed a full-scale fake town in Huntsville, Alabama to train agents on defending against cyberattacks. The Cyber Range spans 22,000 …
Google Cloud's Open Knowledge Format turns scattered docs into Markdown files for AI agents
Google Cloud released Open Knowledge Format (OKF), a standardized specification for converting scattered organizational documents into structured Mark…
AI coding agents find the right file but miss the exact lines that matter, study shows
AI coding agents excel at locating the correct file when fixing bugs but fail to identify the specific lines that need modification. A new benchmark c…
The Subsidy Ended: What Tool-Using Agents Actually Cost
GitHub Copilot shifted to metered billing on June 1, ending unlimited usage for paid subscribers and introducing an explicit cost structure that revea…
Cheaper, faster, and culturally aware, Avataar’s video AI is built for India’s scale
Avataar AI launched a distilled video generation model priced at $0.005 per second, targeting India's massive market at a fraction of typical AI video…
Google sues Chinese cybercrime network that used Gemini to automate scams
Google filed suit against a Chinese cybercrime network that weaponized its Gemini AI model to automate large-scale scam operations. The defendants all…
Open model Kimi K2.7 Code undercuts GPT-5.5 and Claude by up to 12x on price per token
Moonshot AI released Kimi K2.7 Code, an open-weights model with one trillion parameters designed for programming tasks. The model trails GPT-5.5 and C…
US government forces Anthropic to disable Claude Fable 5 and Mythos 5 for all customers worldwide
The US government has ordered Anthropic to disable Claude Fable 5 and Mythos 5 globally, citing jailbreak vulnerabilities. Anthropic is complying but …
AI Weekly Issue #495: Musk, Zuckerberg killed Trump's AI safety order in three phone calls
Elon Musk, Mark Zuckerberg, and David Sacks blocked Trump's proposed AI safety executive order through three phone calls on Wednesday night, derailing…
AI Weekly Issue #484: Your AI chats can be used against you in court
# AI Weekly Issue #484: Your AI Chats Can Be Used Against You in Court Conversations with AI chatbots now carry legal consequences. Courts are treati…
Amazon CEO reportedly raised Anthropic model concerns before government crackdown
Amazon CEO Andy Jassy raised security and safety concerns about Anthropic's AI models before the company restricted global access to Claude 3.5 Sonnet…
Anthropic blocks all public access to Claude Fable 5, Mythos 5 following US government order — what enterprises should do
The US government issued an export control directive ordering Anthropic to suspend all access to Claude Fable 5 and Claude Mythos 5 for foreign nation…
New AI model called "Count Anything" does exactly what it says, and that's harder than it sounds
Researchers have unveiled "Count Anything," an AI model that counts objects in images across vastly different contexts using only a text prompt. The s…
Google Research's Gemini-SQL2 tops text-to-SQL benchmarks by a wide margin
Google Research unveiled Gemini-SQL2, a specialized system that converts natural language queries into executable SQL commands. Built on the Gemini 3.…
Microsoft's SkillOpt boosts GPT-5.5 by using nothing but a trained Markdown file
Microsoft researchers partnered with three Chinese universities to develop SkillOpt, a novel optimization technique that improves AI agent performance…
Claude Fable 5 outpaces GPT-5.5 by 13 points on FrontierMath's toughest problems
Anthropic's Claude Fable 5 has achieved 88 percent accuracy on FrontierMath's hardest problem tier, substantially outperforming OpenAI's GPT-5.5, whic…
Kimi K2.7-Code cuts thinking tokens 30% — but practitioners say the benchmarks don't check out
Moonshot AI unveiled Kimi K2.7-Code this week, claiming the open-source coding model reduces inference "thinking tokens" by 30% while delivering doubl…
Google researchers introduce 'faithful uncertainty,' allowing LLMs to offer best guesses instead of hallucinations
Google researchers have identified a path forward in the stubborn hallucination problem plaguing large language models. Their solution centers on "fai…
AI Weekly Issue #490: Anthropic just had AI's biggest week of 2026
Anthropic transformed its market position in five days, executing moves that reshape AI infrastructure and enterprise adoption. The company's Q1 reven…
Radar Trends to Watch: June 2026
Autonomous agents are moving beyond isolated task execution into full operational management. Cloudflare and Stripe have deployed an agent capable of …
Anthropic’s safety warnings may have just backfired — the government has pulled the plug on its most powerful AI
Anthropic has publicly disputed a government decision to halt deployment of one of its most powerful AI models, citing safety concerns over a discover…
NanoClaw and JFrog launch 'immune system' to block AI agents from downloading malicious code
NanoClaw and JFrog have launched a security integration designed to prevent AI agents from downloading malicious code. The partnership hardwires NanoC…
PixelRAG beats text parsers on accuracy and cuts AI agent token costs 10x
A new retrieval-augmented generation system bypasses text parsing entirely, improving accuracy while slashing token costs for AI agents by tenfold. P…
The AI industry's platform trap is starting to look a lot like Microsoft's
Anthropic faces mounting pressure from customers, partners, and investors over deliberate restrictions on its new Mythos model paired with the company…
AI Weekly Issue #485: When AI teaches AI, it teaches in secret
AI model training has entered a phase where artificial intelligence systems teach other AI systems in ways humans cannot directly observe or understan…
I Let an AI Agent Run 40 Experiments While I Slept
An AI agent running unsupervised machine learning experiments delivered measurable improvements overnight but also wasted time on avoidable problems, …
Xiaomi's new open source, agentic AI coding harness MiMo Code beats Claude Code at ultra-long, 200+ step tasks
Xiaomi has open-sourced MiMo Code V0.1.0, a terminal-native AI coding assistant designed to handle complex, multi-step development tasks. The tool tar…
Context compression finally works in production: new research cuts LLM input 16x without the accuracy hit
Context windows are becoming a computational bottleneck for large language models, especially as agents accumulate tokens from retrieved documents, re…
Anthropic study shows AI needs hours, not weeks, to build exploits from security patches
Anthropic's security research demonstrates a critical acceleration in exploit development. The company's Mythos Preview model converted security patch…
AI Weekly Issue #502: Your AI can now spend your money — Visa wired it into ChatGPT
Visa has integrated payment processing directly into ChatGPT, allowing AI agents to make purchases at any Visa-accepting merchant without user interve…
Generative AI in the Real World: Agentic Systems Fundamentals with Maarten Grootendorst
Maarten Grootendorst, creator of BERTopic and developer relations engineer at Google DeepMind, argues that foundational AI techniques remain essential…
Microsoft’s open-source SkillOpt automatically upgrades AI agent skills without touching model weights
Microsoft released SkillOpt, an open-source tool that automatically optimizes AI agent skills without modifying model weights. The system addresses a …
What AI benchmarks miss about real-world performance
Enterprise AI teams optimize for the wrong metrics. They chase benchmark scores for compute and training throughput while ignoring what happens when m…
Google's DiffusionGemma generates 256 tokens in parallel and self-corrects as it goes
Google released DiffusionGemma this week, an open-source experimental model that generates text using diffusion, the same iterative refinement approac…
AI Weekly Issue #498: Anthropic files for an IPO. NVIDIA ships its stack.
Anthropic filed confidentially for a public offering today, marking a significant step toward going public for one of AI's largest independent laborat…
AI Weekly Issue #496: Anthropic's Pentagon model is now everyone's model
Anthropic's release of Mythos marks a watershed moment in AI access. The model, previously restricted to cleared Pentagon contractors, is now availabl…
Surprise upset: GPT-5.5 beats Claude Fable 5 on brutal new Agents’ Last Exam benchmark
UC Berkeley researchers have launched Agents' Last Exam (ALE), a demanding benchmark designed to evaluate whether AI models can execute complex, long-…
Researchers say they trained a foundation model from scratch for about $1,500
Sapient researchers claim they trained a foundation language model from scratch for approximately $1,500, challenging the conventional wisdom that bui…
Anthropic CEO calls for FAA-style regulation of powerful AI models: what enterprises should know
Dario Amodei, CEO of Anthropic, is pushing for regulatory oversight of advanced AI models modeled after the Federal Aviation Administration's approach…
MassMutual's AI strategy: 12-month contracts, 30% productivity gains, zero lock-in
MassMutual built an AI strategy designed to avoid vendor lock-in and adapt as the technology landscape shifts. Instead of committing to multi-year con…
Google's new open model DiffusionGemma generates text from noise instead of word by word
Google released DiffusionGemma, a 26-billion-parameter open model that abandons the traditional token-by-token text generation approach. Instead, the …