May 30, 2026 · Friday

xAI Releases Grok Build 0.1 API for Agentic Coding at $1/M Tokens

Public beta of the same model powering the Grok Build CLI — priced at $1 per million input tokens and $2 per million output. xAI calls it extremely cost-effective, intelligent, and fast.

Grok Build 0.1 is now available via the xAI API in public beta for agentic coding workflows.

xAI launched the public beta of Grok Build 0.1 API, priced at $1 per million input tokens and $2 per million output. The model, designed specifically for agentic programming, is the same one that powers the Grok Build CLI. With 2,938 likes and over 511,000 views on its announcement, the release signals strong developer appetite for coding-specialized models that can iteratively plan, execute, and debug complex software tasks. xAI positions this as a cost-effective alternative in an increasingly crowded agentic coding market.

Codex Computer Use Lands on Windows with Mobile Remote Steering

OpenAI announced that Codex's Computer Use feature now supports Windows, allowing the coding agent to test applications, debug flows, and review work directly on Windows machines where project context lives. The ChatGPT mobile app can now connect to Windows machines, letting developers start, review, and steer tasks on the go while work continues on their desktop. The announcement drew 5,429 likes and over 546,000 views, reflecting strong developer demand for cross-platform agentic tooling.

Claude Opus 4.8 Adds Mid-Conversation System Instructions Without Breaking Prompt Cache

Anthropic released Claude Opus 4.8 with a key capability: system instructions can now be added mid-conversation without interrupting the prompt cache. This means more cache hits, lower cost, and reduced latency for API requests. The feature addresses a long-standing friction point for developers building multi-turn agentic applications where context must evolve dynamically. The update arrives alongside broader Opus 4.8 improvements, including increased honesty and a roughly 4x reduction in code defect omission rate.

Step-3.7-Flash: a 198B sparse MoE vision-language model with day-0 vLLM support.

Step-3.7-Flash: 198B MoE Vision-Language Model Debuts with Day-0 vLLM and NVIDIA Support

Jieyue Xingchen's Step-3.7-Flash launched with immediate ecosystem backing. vLLM announced day-0 support, noting the model packs 198 billion parameters as a sparse mixture-of-experts architecture with roughly 11 billion active parameters per token. It supports native image and text input across a 256K context window — suited for long documents, multi-file repositories, and dense visual interfaces. Simultaneously, NVIDIA confirmed NIM and NeMo acceleration endpoints are ready, while the model also appeared on HuggingFace in GGUF quantized format for local hardware. Multiple integration partners — OpenRouter, Kilocode, ModelScope, and ZenMux — have already onboarded the model into their stacks.

Cohere Command A+ Beats Mistral, DeepSeek, and Google Translate in Machine Translation

Cohere released Command A+, setting a new company benchmark for machine translation. The model opened a clear gap over open-source peers Mistral Medium 3.5, DeepSeek, and OpenAI's gpt-oss, as well as Claude Opus 4.6. It also outperformed Google Translate, the long-standing specialist system, though Cohere acknowledged that RWS remains superior.

Visa Invests in AI Coding Platform Replit to Power Agentic Payments

Visa has invested in Replit, exploring how the AI coding platform can enable agentic payments. The partnership aims to help developers build payment applications more efficiently by leveraging Replit's AI-powered development environment. The move signals growing financial sector interest in agentic AI infrastructure as payment workflows become programmable and autonomous.

Step-3.7-Flash GGUF Lands on HuggingFace for Local Hardware

Jieyue Xingchen released the GGUF quantized version of Step-3.7-Flash on HuggingFace, enabling users to run the model on their own hardware. The release aligns with the broader push toward local-first AI, allowing developers to download and run frontier models without API keys or cloud dependencies.

"The words or the language, as they are written or spoken, do not seem to play any role in my mechanism of thought."
Albert Einstein, cited by François Chollet — on the limits of natural language for invention

Anthropic Reports Annualized Revenue Run-Rate of $47 Billion

Simon Willison relayed Anthropic's self-reported annualized revenue growth, noting that Axios founder Jim VandeHei said he could not find any company in any industry in any era that has scaled organic revenue this quickly at this level. When Anthropic was at $30 billion, the claim already seemed extraordinary; at $47 billion, it defies comparison. The figure underscores the breakneck commercialization of frontier AI models.

GPT-5 Pro Series Remains Unbeaten on Single-Shot Hard Problems Since Last Summer

Ethan Mollick observed that GPT-5 Pro series models have consistently been the best at solving the hardest problems in a single attempt since summer 2025, with no real competition emerging in all that time. The observation highlights OpenAI's sustained lead in frontier reasoning despite an increasingly crowded model landscape.

Claude Dynamic Workflows Can Launch Hundreds of Subagents for Large-Scale Tasks

Commentator op7418 noted that Claude's newly released dynamic workflows may be more significant than the Opus 4.8 model update itself. The system extends concurrent subagent logic, potentially launching hundreds of subagents to tackle massive tasks such as researching an entire codebase or generating comprehensive reports in a single session.

DeepSeek's Infrastructure Engineering Is So Good the Industry Politely Pretends It Doesn't Exist

Teortaxes Tex remarked that DeepSeek is so excellent at infrastructure engineering that the rest of the industry has to pretend they are operating at a loss or that it simply is not happening. The observation points to the competitive tension around DeepSeek's cost efficiency and its underappreciated operational excellence in serving large-scale AI workloads.

France Releases Advanced Open-Source LLM Under Apache 2.0 License

An advanced large language model has been released by France under the permissive Apache 2.0 license, targeting both personal and enterprise use cases. The move represents a significant European contribution to the open-weight AI ecosystem, offering a sovereign alternative to American and Chinese frontier models.

AI BRIEFS05·30

PRODUCT

Cursor Introduces Auto-Review Mode

Cursor released Auto-review mode, allowing agents to run tool calls with fewer approval prompts while executing more safely. The feature reduces friction in agentic coding loops.

MODEL

Surya OCR 2 Released with 650M Parameters

VikParuchuri announced Surya OCR 2, scoring 83.3% on the olmocr benchmark and 87% on an internal 91-language benchmark, positioning it as the top sub-3B OCR model.

SAFETY

OpenAI Launches Rosalind Biodefense Program

OpenAI announced the Rosalind biodefense project to accelerate AI-driven biosafety and pandemic preparedness, expanding GPT-Rosalind access to U.S. government and allied partners.

PRODUCT

Stanford OpenJarvis Runs Locally via Ollama

OpenJarvis, a local-first personal AI developed by Stanford HazyResearch and Scaling Intelligence Lab, can now run via Ollama as part of the Intelligence Per Watt research initiative.

PRODUCT

Runway Aleph 2.0 Exclusive to Adobe Firefly

Adobe Firefly has the exclusive on Runway Aleph 2.0 video generation model, allowing users to generate new clips by editing existing videos. Available through June 1.

PAPER

Qwen-VLA Unifies Vision-Language-Action Across Robots

Qwen-VLA proposes unified vision-language-action modeling across tasks, environments, and robot embodiments, advancing general-purpose robotic AI.

MODEL

Cartesia Ink-2 Tops Streaming Speech-to-Text Leaderboard

Cartesia released Ink-2, ranking first on the streaming speech-to-text leaderboard, optimized for low-latency transcription.

PRODUCT

Luma Agents Auto-Generate Promotion Graphics

Luma released Luma Agents, which automatically generate full promotion graphics from input content and marketing hooks, described as a creative team multiplier.

PRODUCT

llama.cpp Launches Official Website llama.app

llama.cpp launched its official site llama.app, enabling frontier models to run locally without API keys, supporting hardware from phones to clusters.

vLLM Integrates Open-Source Rust Tokenizer fastokens

vLLM now includes fastokens, an open-source Rust BPE tokenizer built by CrusoeAI and NVIDIA Dynamo, compatible with DeepSeek, Qwen, Kimi, MiniMax, and Nemotron models.

vLLM Rolls Out Two Major RL Upgrades

vLLM released a native weight synchronization API and an improved pause/resume feature for asynchronous RL training, standardizing weight transfer with optimized NCCL and CUDA IPC support.

Opus 4.8 ParseBench: Tables Up, Charts Down

LlamaIndex published ParseBench results for Opus 4.8, showing gains in tables and semantic formatting but slight drops in chart parsing and content faithfulness, with a minor page-price increase.

GPIC Dataset: 100M VLM-Annotated Image-Text Pairs

Keshi Geyan released the GPIC dataset containing 100 million VLM-captioned image-text pairs for visual generation benchmarking.

NVIDIA Blackwell Ultra Delivers 50x Throughput Per Megawatt

NVIDIA promoted its AI factory vision, with Blackwell Ultra achieving 50x higher throughput per megawatt, converting energy into continuous intelligence.

Simon Willison Reviews Claude Opus 4.8

Anthropic released Opus 4.8 with modest but real improvements: increased honesty, lowest hallucination rate, same pricing, and minimum cache tokens reduced from 4096 to 1024.

Step 3.7 Flash Gets Day-0 NVIDIA NIM and NeMo Support

Jieyue Xingchen confirmed NVIDIA NIM, NeMo, and GPU-accelerated endpoints are ready for Step 3.7 Flash on launch day.

Terence Tao: AI Frees Researchers to Pursue Bolder Ideas

OpenAI shared mathematician Terence Tao's view that AI creates more room for experimentation, enabling researchers to test unexpected paths and discover what might otherwise stay out of reach.

Red Hat Speculators v0.5.0 Adds DFlash Training Support

Red Hat AI released Speculators v0.5.0, adding DFlash training support for drafting all tokens in a single pass via block diffusion, alongside two other major updates.