Claude Sonnet 5: the most agent-capable Sonnet yet
Top-tier coding and tool-use performance at Sonnet pricing with a 1M context window — now the default in Claude Code and available on all platforms.
Anthropic released Claude Sonnet 5, positioning it as the Sonnet series model best suited for agentic tasks. The model delivers top-tier performance on coding and tool use, features a 1 million token context window with up to 128K max output, and becomes the new default model in Claude Code for Pro users. It is available everywhere on the Claude Platform, including the API and Managed Agents. Claude Sonnet 5 is designed for autonomous operation — capable of making plans, using tools like browsers and terminals, and running multi-step workflows independently. The new tokenizer increases English input cost by approximately 1.4x and Spanish by 1.33x, while Chinese Mandarin costs remain roughly unchanged. API sampling parameters such as temperature are no longer supported; the model defaults to adaptive thinking mode.
Google DeepMind unveils Nano Banana 2 Lite and Gemini Omni Flash
The fastest Gemini image model and a new video generation model, both available via API and AI Studio.
Google DeepMind announced two major releases: Nano Banana 2 Lite, the fastest and cheapest Gemini image model, and Gemini Omni Flash, a video generation and editing model now accessible through the Gemini API and Google AI Studio. Nano Banana 2 Lite generates images in approximately 4 seconds at $0.034 per image, while Gemini Omni Flash enables developers to create and edit high-quality videos programmatically. Both models are now live across Runway and other integrations.
OpenAI introduces GeneBench-Pro for biological data
A challenging benchmark for agents navigating complex biological datasets.
OpenAI released GeneBench-Pro, a research-level benchmark designed to evaluate how well AI agents can navigate messy biological data, choose the right analysis path, and make judgment calls that real computational research depends on. The benchmark tests agents on processing complex multi-modal biological datasets and selecting analytical strategies autonomously.
U.S. lifts export controls on Anthropic models
Commerce Department removes restrictions on Claude Fable 5 and Mythos 5.
Anthropic received notice that the U.S. Department of Commerce has removed export controls on Claude Fable 5 and Mythos 5. The company announced it will begin restoring access tomorrow and share further updates shortly. The move marks a significant policy shift for advanced AI model export restrictions.
NVIDIA inference software boosts DeepSeek V4 performance by 5×
Token costs driven to one-fifth in a single month on Blackwell.
NVIDIA announced that software optimizations on Blackwell GPUs have increased DeepSeek V4 inference performance by up to 5× within one month, reducing token costs to roughly one-fifth of previous levels. The company emphasized that its inference software stack continues to drive down costs long after AI infrastructure is deployed, making it more economical to run large models at scale.
Vercel and Shopify rebuild Hydrogen
Agent-first, runtime-agnostic rewrite of the headless commerce framework.
Vercel announced a partnership with Shopify to rebuild the open-source framework Hydrogen from scratch. The new version is agent-first, runtime-agnostic, and runs anywhere JavaScript does. A Next.js developer preview is now available. Vercel will serve as design partner for the open-source project.
ASPIRE: robotic skill library that never resets to zero
Coding agents observe multimodal sensory traces to self-evolve reusable skills.
Researchers proposed the ASPIRE framework, where robots use coding agents to observe multimodal sensory traces from simulation and real environments, automatically generating and accumulating reusable skills. Under ASPIRE, a robot solving its 100th task is no longer as clueless as solving its first — skills compound indefinitely rather than resetting.
vLLM open-sources semantic router for LLM queries
Routes queries by intent to the most suitable model, optimizing resource allocation.
vLLM released an open-source semantic router based on a hybrid model-of-models architecture. The router dispatches LLM queries to the most appropriate model based on request intent — routing a simple weather query differently from a legal contract analysis. It supports custom routing policies and functions including text classification and PII detection.
OpenAI debugs a year of data infrastructure crashes
One hardware fault and an 18-year-old open-source bug uncovered.
OpenAI shared its experience diagnosing a year-long series of crashes in its data infrastructure. The investigation uncovered one hardware-level issue and a second bug that had gone unnoticed in open-source code for 18 years. The postmortem details the debugging methodology used to trace both root causes.
Claude Desktop public beta now on Linux
Ubuntu and Debian users get Claude Code, Cowork, and chat on desktop.
Claude Desktop is now available in public beta on Linux, supporting Ubuntu and Debian. Paid-plan users gain access to Claude Code, Claude Cowork, and chat features in a first-class desktop experience alongside the existing browser and terminal interfaces.
Cursor integrates Sonnet 5 with 57% on CursorBench
A meaningful step up from Sonnet 4.6's 49% benchmark score.
Claude Sonnet 5 is now available in Cursor, scoring 57% on CursorBench compared to Sonnet 4.6's 49%, a significant performance improvement. The integration makes the new model available to Cursor users for coding tasks immediately.
Vercel Services unifies frontend and backend deployment
Atomic deploys, single preview URL, and private networking between services.
Vercel launched Services, enabling developers to deploy frontend and multiple backend services as a single project. The platform supports atomic deployments and rollbacks, a single preview URL for the entire application, and private networking between services — all within one Vercel project.
Seed Audio 1.0: voice and dubbing in 18 languages
Voice change, text narration, and multilingual video dubbing.
Higgsfield launched Seed Audio 1.0, an audio model supporting voice transformation, text-to-speech narration, and video dubbing into 18 languages. It is available on the Higgsfield platform and via Claude MCP.
Stanford: 71.3% of ChatGPT queries work on local models
Study suggests enterprise AI workloads could shift to on-device inference.
A Stanford University study found that 71.3% of ChatGPT queries could be accurately answered by local models. Hugging Face CEO Clement Delangue noted that a major portion of enterprise AI workloads could potentially run locally for free, dramatically reducing reliance on expensive frontier API costs while also lowering data exposure risks.
Claude Science: an AI workbench for researchers
Positioned as Claude Code for the life sciences domain.
Anthropic introduced Claude Science, a dedicated AI workbench for scientific researchers. Positioned as a Claude Code for life sciences, it aims to replicate the latter's transformative impact on programming within the research community, with CEO Dario Amodei expressing optimism about its potential.
Loops are now a key part of how we get AI agents to iterate at length to build software.
Andrew Ng on Loop Engineering
Loop Engineering: the new paradigm for AI agent iteration
Andrew Ng published an analysis of Loop Engineering, a term that went viral after mentions by Boris Cherny of Claude Code and Peter Steinberger of OpenClaw. Ng argues that loops are now fundamental to getting AI agents to iterate effectively over long horizons in software construction. Citing real-world examples including Claude Code, he explores how structured iteration loops enable agents to build software at increasing levels of complexity.
Simon Willison on Sonnet 5: new tokenizer raises costs
Simon Willison published detailed notes on Claude Sonnet 5, highlighting that the new tokenizer makes English input approximately 1.4× more expensive and Spanish about 1.33× more expensive, while Simplified Mandarin costs remain roughly unchanged. The API no longer supports sampling parameters like temperature; the model defaults to adaptive thinking. Despite the effective ~30% price increase from the tokenizer change, the base pricing matches Sonnet 4.6, offering near-Opus 4.8 capability at a fraction of the cost.
From chatbots to agents: AI reshapes how we work
Ethan Mollick wrote about how the rapid rise in AI abilities is transforming workplace AI usage and triggering sudden shifts in policies and markets. Evaluations from METR and the UK AI Safety Institute show the human programming hours completed per single AI prompt continue to climb steeply. Opus 4.7 independently ran for 14 hours to complete work estimated at 2–17 weeks for a human, at a cost of $251. Within OpenAI, 25% of employees now run at least four agents simultaneously each week, spanning both technical and non-technical roles including legal and HR.
Claude Code From Scratch: core architecture in 4,300 lines
An open-source ebook and codebase has been published that recreates Claude Code's core architecture using approximately 4,300 lines of code, available in both TypeScript and Python versions. Rather than reading through Claude Code's 500,000-line codebase, developers can study the recreated Agent Loop, 13 tools with parallel execution support, and other core components in a compact, educational format.
Gemini API skill library reaches 96% agent code accuracy
The Gemini API introduced gemini-skills, a lightweight context injection mechanism designed to address model knowledge staleness. The library includes skills for video editing, text-to-video generation, image-referenced video generation, and first-frame-to-video conversion, along with input preprocessing tools. Evaluations show that adding skills improves agent-generated API code accuracy to 87% on Gemini 3 Flash and 96% on Gemini 3.1 Pro. Skills can be installed via Vercel or the Context7 CLI.
Why Anthropic worries OpenAI
Industry analysis suggests Anthropic keeps OpenAI on edge because of the suspicion that GLM 5.2 at 10 trillion parameters would not outperform Fable 5, and even GPT 5.5 at 10T may fall short. The commentary hints that Fable may not represent the full Mythos architecture, possibly around 3T parameters, and that scaling laws optimized for large models do not guarantee proportional performance gains — challenging the assumption that bigger models always win.
Managed Agents get streaming events, webhooks, and credential scoping
Anthropic added streaming session event deltas, per-session agent overrides, new webhook event types, reverse pagination, and credential injection scope control to Claude Managed Agents.
Seedance 2.0 Mini brings fast video generation to canvas
Luma released Seedance 2.0 Mini, supporting rapid video generation and in-canvas iteration, with Luma Agents handling planning, generating, and refining across creative stages.
LlamaParse MCP extracts structured data from documents
LlamaParse MCP can now automatically pull structured data from contracts, invoices, and reports, giving agents direct access to knowledge bases beyond basic parsing and classification.
Claude Sonnet 5 available for Pro and Max subscribers
Perplexity now offers Claude Sonnet 5 to Pro and Max users, and it can be selected as the orchestrator model in the Computer feature.
Bridgewater fine-tunes models for financial news filtering
Bridgewater Associates, as a Tinker customer, shared how they fine-tune models to identify interesting financial news, outperforming any frontier model at lower cost.
OSWorld2.0 benchmarks computer-use agents on real-world tasks
OSWorld2.0 was released as a benchmark for evaluating computer-using agents on long-horizon real-world tasks, pushing beyond short-duration evaluations.
Ornith-1.0-35B now callable within Claude Code
The Ornith-1.0-35B model is now integrated into HuggingFace Claude, allowing direct invocation inside Claude Code via the hf-claude bridge.
Build a Reasoning Model (From Scratch) published
Author Sebastian Raschka published a 440-page full-color book covering inference scaling, reinforcement learning, and distillation for building reasoning models from scratch.
Sonnet 5: near-Opus capability at 40% of Opus API price
Anthropic released Claude Sonnet 5 to replace Sonnet 4.6 as the default model, with agent capability approaching Opus 4.8 while costing only 40% of Opus via API.
Claude Code accused of watermarking Chinese proxy users
Reports on Reddit and GitHub indicate Claude Code silently checks if a user accesses the service via a Chinese proxy and embeds a nearly invisible Unicode watermark in the system prompt sent to Anthropic.
Xiaomi MiMo reveals system prompts including persistent agent
The Xiaomi MiMo codebase contains system prompts for several model families, including beast.txt, designed for extremely persistent autonomous goal-chasing agents that must iterate until problems are solved.
Tesla Cybercab drives autonomously in Austin
Elon Musk shared video footage of the Cybercab driving in Austin without a steering wheel or pedals, showcasing Tesla's full self-driving progress on public roads.
OpenAI Devs: agent engineering shifts toward direction-setting
As agents take on longer-running work, engineering shifts from coding to setting direction, reviewing agent output, and designing better systems around the models.
Runway integrates Gemini Omni Flash for video generation
Runway now supports generating and editing videos using Gemini Omni Flash via prompts, images, or video. Users can access the model through Runway's Agent interface.
Runway partners with Japan's MIXI for world models in gaming
Runway announced a strategic collaboration with MIXI, one of Japan's largest gaming and entertainment companies, to deploy Runway and explore world model applications in gaming.
Kling AI works win three Lions at Cannes 2026
Films created with Kling AI took home a Silver in Film and two Bronze awards in the Film B2B and AI Craft categories at the Cannes Lions International Festival of Creativity.
Step 3.7 Flash ranks top 10 on OpenRouter this month
Step 3.7 Flash processed 4.29 trillion tokens on OpenRouter this month, used by developers in real agent tasks, coding, and long-context workflows.
Perplexity CEO confirms Sonnet 5 as Computer orchestrator
CEO Arav Srinivas announced that Sonnet 5 is now the default orchestrator model for Computer users on Pro and Max plans.
Replit CEO: Etched chip designed for modern inference from scratch
Replit CEO Amjad Masad argued AI is expensive partly because most workloads run on generic pre-LLM hardware, while Etched is the first system purpose-built for modern inference workloads.
Tri Dao on Etched: custom silicon could cut inference cost 10×
Tri Dao praised Etched for completing chip design and tape-out within two years, hard-coding attention into silicon and achieving very high model flop utilization. He expects this hardware to bring intelligence costs down by an order of magnitude.
Nathan Lambert visits Meituan: why diverse companies build models
Nathan Lambert shared his visit to Meituan, calling it one of the best open model builders. He noted that Meituan exemplifies how varied companies can succeed in AI model development, baffling observers about their motivations.
Tri Dao recommends deep dive on AI model economics
Tri Dao shared an article analyzing the economic sustainability of open versus closed models and inference providers.
Former Gemini researcher impressed by Etched benchmarks
A former Gemini team member was deeply impressed by Etched's presentation, noting the chip has taped out and demonstrated live benchmarks.
Twitter launches MCP for AI-driven data analysis
Twitter opened its MCP interface, allowing users to have AI automatically organize, summarize, and analyze tweet data via API.
Cursor iOS app launched with lock-screen progress display
Cursor released an iOS app showing task progress on the lock screen and sending interface screenshots for user review upon completion.
LongCat-2.0 arriving on Hugging Face shortly
The LongCat-2.0 model is set to be released on Hugging Face with details to follow.
World models improve coding agent performance
Researchers find that code world models benefit coding agents, analogous to how world models help embodied agents in physical environments.
Larger LLMs improve simultaneously across coding, ethics, and medicine
Ethan Mollick notes the peculiar generality of LLMs: a bigger model better at coding is also better at ideation, ethical advice, medicine, and math.
Using AI agents for microservices system design
A developer proposed placing all microservices in one workspace with per-service docs so AI agents understand responsibility boundaries for system design.
Epoch AI: largest US cluster approaches 1M H100 GPUs
Epoch AI's open database using satellite data reveals the largest U.S. AI cluster nearing 1 million H100 GPUs, raising questions about policy benefits.
Capturing organizational value from high-intelligence AI
Ethan Mollick argues enterprises must design organizations like high-human-capital firms to capture the value of increasingly capable AI.
Etched hits $1B orders with 10× inference speed
Commentators report Etched now has $1 billion in orders and delivers 10× inference speeds over state-of-the-art competition.
Etched Sohu: foundational layer for intelligent computing
MillionInt describes Etched Sohu as a foundational chip that the intelligence layer relies on, praising its dedication and innovation.