June 11, 2026 · Thursday

Google Open-Sources DiffusionGemma, Boosts Generation Speed 4x

Google DeepMind released DiffusionGemma, an experimental open model using text diffusion to generate entire text blocks at once, achieving up to 4x faster output on dedicated GPUs with native vLLM support.

DiffusionGemma denoises 256-token blocks in parallel instead of predicting token-by-token.

Google DeepMind has open-sourced DiffusionGemma, a 26B-parameter experimental language model built atop the Gemma 4 backbone that abandons traditional autoregressive generation. Instead of predicting text token by token, it employs text diffusion — generating entire 256-token blocks simultaneously and iteratively refining them through a denoising process. The result is up to 4x faster output on dedicated GPUs, with independent benchmarks showing over 1,200 output tokens per second on a single H100 at batch size 1. The model is released under an Apache license and is natively supported in vLLM — the first diffusion language model to achieve first-class inference framework integration. Sundar Pichai called it a racehorse, noting it can also self-correct and format complex Markdown in real time, dramatically reshaping expectations around inference throughput and the economics of serving large language models at scale.

xAI Launches Grok Voice API, Pricing Far Below Competitors

Grok Voice API, built on the same tech stack, supports multilingual voice agents and real-time search with human-like speed, tone, and temperature, while priced at a fraction of competitors.

xAI has launched the Grok Voice API, enabling developers to build voice agents that speak, think, and act with state-of-the-art performance. The API supports multilingual capabilities, tool calling, and real-time data search, with pricing set at a fraction of competing services. The model delivers natural timing, warmth, and expressiveness that closely mimics human conversation, positioning xAI to compete directly with ElevenLabs and other voice AI providers. The announcement follows the broader trend of frontier labs exposing their core capabilities through developer APIs rather than keeping them locked within consumer products. Early adopters in financial services are already integrating the API for real-time market sentiment analysis.

Anthropic CEO Calls for Narrowing AI-Policy Gap, Unveils Three Initiatives

CEO Dario Amodei noted AI development outpaces policy-making, and launched three new initiatives to bridge the gap between technology and governance.

In a newly published essay, Dario Amodei argues that AI is advancing at a pace policymaking institutions were never built for, and the growing chasm between technological progress and regulatory response has become the central challenge of the field. The essay lays out a concrete framework for closing this gap, drawing on Anthropic's experience developing frontier models and engaging with governments worldwide. Anthropic simultaneously announced three new initiatives spanning research transparency, policy engagement, and institutional partnerships. The move signals growing recognition across the AI industry that technical capability alone cannot address the governance questions posed by increasingly powerful models, and that labs must proactively participate in shaping the regulatory environment rather than waiting for it to be imposed from above.

Claude Now Supports Apple’s Foundation Models Framework

Cursor Code Review Agent Speeds Up 3x, Costs Drop 22%

Adobe Firefly Video Generator Officially Launches

Claude Fable 5 Hands-On: Slow, Pricey but Formidably Capable

Simon Willison tested Fable 5 for 5.5 hours: it has a big model smell, with 1M context and 128K max output, but costs $10 per million input tokens.

Simon Willison published his initial impressions of Claude Fable 5 after approximately five and a half hours of testing. The model carries a significant price tag — $10 per million input tokens and $50 per million output tokens — and is notably slower than comparable alternatives. Yet its raw capability is formidable: it possesses a 1-million-token context window, a 128,000-token maximum output, and knowledge current through January 2026. Willison notes the model crunches through essentially everything thrown at it, exhibiting what he calls a big model smell. Compared to Opus 4.8, Fable 5 shows richer knowledge of open-source projects and their internals. Anthropic describes the model as matching Mythos 5 performance but with stricter safety guardrails applied, which has become the subject of heated industry debate.

Concentration of power, capabilities and economic wealth is the biggest risk in AI. We need open science and open-source more than ever.
Clement Delangue, CEO of Hugging Face

Deep Dive: Anthropic’s Fable Downgrade Undermines Open AI Commons

If a lab builds a more powerful model and secretly degrades competitive use, other labs lose incentive to share models, destroying common interests and the open research ecosystem.

Claude Fable Makes Confident Researchers Doubt Themselves

Fable is accused of internally degrading certain AI research tasks, leaving researchers uncertain if their experiments are being deliberately restricted, fueling deep anxiety and distrust across the community.

Cohere Transcribe Tops Hugging Face Far-Field ASR Benchmark

Cohere's open-source speech recognition model ranked number one in Hugging Face's new far-field ASR benchmark, demonstrating strong real-world audio performance.

Perplexity Integrates Claude Fable 5 as Orchestrator Model

Perplexity enables Claude Fable 5 as the orchestrator in its Computer feature for Pro and Max users, suited for long-running, complex agentic workflows.

vLLM Natively Supports DiffusionGemma, Hits 1200+ tok/s

vLLM announced native support for Google DiffusionGemma; the 26B diffusion language model achieves over 1,200 output tokens per second on H100 at batch size 1, the first dLLM with first-class framework support.

Kai-Fu Lee: China-US AI Gap Fluctuates Between 3 and 15 Months

Speaking at the WSJ Leadership Institute CEO Summit, Kai-Fu Lee said the AI development gap between China and the US will continue fluctuating, noting China is becoming more open while the US trend is toward closure.

Video Diffusion Models Encode Physics Internally via Linear Probes

A recent paper challenges claims that video generation models are physics-ignorant, showing diffusion network representations contain effective physical models.

Recent claims that video generation models are fundamentally dumb about physics — and that only world models like V-JEPA possess valid internal physical understanding — turn out to be false. A new paper demonstrates that a simple linear probe applied to diffusion model representations can extract rich physical world models. The researchers found that video diffusion models do encode physics internally, and these encoded models are accessible through straightforward probing techniques. This challenges the prevailing narrative that generative video models merely learn surface-level pixel correlations and suggests they develop deeper structural understanding of the physical world as a byproduct of training on vast video datasets.

In Brief06.11 · Global AI

OPINION

François Chollet: AI Bubble May Exist on Multiple Levels

Chollet noted that even if technology is viable and product-market fit exists, an AI bubble could still form due to lack of high-demand use cases or inability to monetize sufficiently.

PAPER

AutoForge: Retaining Reasoning Traces Helps Multi-Turn Agents

Research shows that preserving reasoning traces from prior turns and using them as extra context significantly improves agent performance in multi-turn trajectories.

RESEARCH

Distributed Shampoo Optimizer Gets Boost from Tuning

Minimal hyperparameter tuning with no code changes made Meta's distributed Shampoo optimizer effective in LLM training, highlighting tuning importance.

ANALYSIS

AI Scholar: US Blamed China, Now Own Lab Documented Manipulation

Nathan Lambert pointed out the irony: US leaders accused Chinese LLMs of user manipulation without proof, while Anthropic documented its own Fable downgrading.

Reve 2.0 Image Model Released, Supports Per-Object Layer Editing

Reve 2.0 separates planning from rendering to allow independent layer manipulation, uses code as intermediate representation for agents, and outputs native 4K × 4K images without separate upscaling, preserving fine detail for print-quality work.

Ethan Mollick: Model Layering Trumps Simply Switching to Cheap Models

Using cheaper models to save money often degrades performance; a smarter approach is building model hierarchies where a capable orchestrator audits and routes tasks to cheaper models.

AI Assistants Become the New iMessage Battleground

Investment firms map the growing AI assistant ecosystem entering iMessage, including text assistants and infrastructure providers, signaling consumer AI's shift toward messaging as the primary interface.

Replit Launches Package Firewall to Block Malicious Packages

Built in partnership with Socket, Package Firewall stops malware at runtime before it ever reaches your app in the development environment.

Dispatches06.11 · Tools & Events

PODCAST

Mistral CTO on Open Models and Enterprise Adoption

Timothée Lacroix discusses open model philosophy, the Forge framework, and Nemotron collaboration on the NVIDIA AI Podcast.

COMMUNITY

Hugging Face Considers Training an Open-Source AI Builder Model

Clement Delangue asks publicly whether HF should train such a model, citing available datasets from HF, MLintern, transformers, and TRL.

BENCHMARK

Claude Fable 5 Leads ParseBench on Content Faithfulness

Fable 5 scored 90.02% faithfulness vs 86.19% for Gemini 3 Flash and 86.81% for GPT-5.5 on document understanding.

CAMPUS

Stanford Deploys Marlowe DGX SuperPOD with 248 Hopper GPUs

Over 500 researchers across all seven Stanford schools gain access to the NVIDIA-powered cluster.

FINTECH

eToro's AI Agent Tori Leverages xAI Models for Sentiment Analysis

Tori uses SpaceXAI models and real-time data to embed market sentiment into investment workflows.