June 8, 2026 · Monday

Sakana AI Founds RSI Lab to Build Self-Improving AI Loop

Sakana AI announced the formation of the Recursive Self-Improvement Lab (RSI Lab), dedicated to building a closed-loop system where models and AI scientists optimize each other. The lab builds on prior work including Agent Native Model and AI Scientist, aiming to create a cycle where models generate researchers that in turn refine the models themselves.

Sakana AI argues the real frontier lies in open systems that continuously improve under real-world limitations, not just ever-larger models and compute clusters.

Sakana AI announced the formation of the Recursive Self-Improvement Lab (RSI Lab), dedicated to building a closed-loop system where models and AI scientists optimize each other. The lab's foundation rests on a counterintuitive thesis: major technical leaps often arise from resource constraints rather than abundance. In an era where the AI industry is fixated on ever-larger models and ever-bigger compute clusters, Sakana AI argues that the real frontier lies in open systems that can continuously improve themselves under real-world limitations. The RSI Lab draws on the company's accumulated research in Agent Native Models, AI Scientist, and related areas. The ultimate goal is a system where models generate AI scientists, and those AI scientists in turn optimize the models — a self-reinforcing loop that could fundamentally reshape how AI systems are developed.

OpenAI Executive: Chat Is Dead, ChatGPT Shifting to Agent Platform

An OpenAI executive told the Financial Times that "Chat is dead" and ChatGPT is transitioning from a pure chat tool to an agent platform, marking its biggest strategic shift since 2022.

OpenAI is preparing the most significant transformation of ChatGPT since its 2022 launch. An internal executive declared to the Financial Times that "Chat is dead," signaling a strategic pivot from a conversational interface to a full agent platform. While the ChatGPT brand is not expected to change, the product itself will no longer be merely a chat tool. The shift reflects the broader industry movement toward autonomous AI agents that can plan and execute multi-step tasks rather than simply responding to prompts. The agent paradigm redefines what it means to interact with an AI system — from asking questions to delegating complex workflows.

SK hynix and NVIDIA Forge Multi-Year Memory Partnership

SK hynix and NVIDIA announced a multi-year technology collaboration to jointly develop next-generation memory for global AI infrastructure.

The partnership will focus on co-developing advanced memory technologies purpose-built for the next wave of AI infrastructure demands. As model sizes continue to grow and inference workloads intensify, memory bandwidth and efficiency have become critical bottlenecks in the AI supply chain. This collaboration pairs SK hynix's manufacturing expertise with NVIDIA's system-level AI architecture knowledge, aiming to deliver integrated solutions that push beyond current HBM limitations. The multi-year scope signals a commitment to sustained, generational improvements rather than a one-off product collaboration, with implications for every major AI training and inference deployment worldwide.

Whenever I don't use Codex for a task, I ask myself why and usually realize that there's some missing context, I needed to write a skill, or I just didn't think to use it. Rarely is it because the task is outside of the capabilities of the model. Overhang right now feels large.
— gdb, Anthropic CEO

Gemma 4 MTP Merged into llama.cpp, Delivering 2x Inference Speed

llama.cpp added multi-token prediction support for Gemma 4 via PR #23398, boosting dense model inference speed by over 2x. While MoE variants showed no significant speedup, dense models averaged more than double the inference throughput. Combined with quantization-aware training, this makes Gemma 4 a compelling option for lightweight local inference on consumer and edge hardware. The merge was contributed by am17an and validated through community testing before landing in the main branch.

Figure Humanoid Robot Production Surges to One per Hour

Figure increased humanoid robot production from one per day to one per hour in just 120 days, achieving a 24x improvement in manufacturing throughput. Industry observers note that this trajectory, if sustained, could yield tens of thousands of units by late 2026. The ramp has direct implications for the physical AI and embodied agent landscape, where robot availability has been the primary scaling bottleneck. By H2 2027, Figure's robots could begin meaningfully impacting productivity in US manufacturing and logistics operations.

Vercel AI Gateway Recovers 1 Trillion Tokens Monthly

Vercel AI Gateway recovers over 1 trillion tokens per month via intelligent retry mechanisms with zero markup over the labs, adding redundancy and zero-data retention enforcement.

OLMo Series May End; Nemotron Carries the Open-Source Torch

Industry observers suggest the OLMo from-scratch series may be winding down, leaving NVIDIA Nemotron as potentially the only team still pursuing fully open-source, from-scratch LLM training.

Biggest Code Eval of the Year Set to Launch

swyx teases the biggest code evaluation launch of the year, promising to reshape standards for the next phase of code generation benchmarks.

China's logic chip capacity reportedly sufficient for millions of chips; HBM and interposer bottlenecks have been bypassed.

China Chip Breakthrough: Logic Capacity Sufficient, HBM Bottleneck Bypassed

Sources indicate that China's logic chip production capacity can support manufacturing at the scale of millions of chips, and critical HBM and interposer bottlenecks have reportedly been routed around. While the resulting chips are described as Hopper-tier in performance — competent but not cutting-edge — the ability to bypass two of the industry's most persistent constraints represents a significant strategic milestone. The development has ramifications for the global AI hardware race, particularly given the analysis framed around Dario Amodei's arms race timeline. Even with performance limitations, the system integration capability means qualified AI compute systems can be built at scale outside the existing supply chain structure.

Paper Challenges LLM Anthropomorphism with Null Hypothesis Test

Yann LeCun shared a provocative paper arguing that any sufficiently strong base can appear human-like, using Age of Empires II trained networks as a compelling example.

The paper questions the common practice of attributing human-like qualities such as morality or natural language understanding to LLMs. By training simple neural networks on the game Age of Empires II, the authors demonstrate that even non-language substrates can exhibit behaviors typically interpreted as "intelligent." The core argument is that LLM human-like attributes are not empirically unique — any strong enough base, from Lego blocks to the Greater Boston area, could theoretically produce similar phenomena. The paper proposes a "null hypothesis" framework: assume LLM non-uniqueness as the experimental starting point, forcing researchers to establish explicit measurement criteria before drawing conclusions about model capabilities.

NVIDIA Dominates Hugging Face Trending with 9 of Top 30 Models

Among the 30 hottest models on the Hugging Face homepage, NVIDIA published 9, signaling a strong return of American open-source AI.

The concentration of NVIDIA models on Hugging Face's front page underscores a broader shift in the open-source landscape. After a period where Chinese and community-driven models dominated the trending charts, NVIDIA's recent releases — including the Nemotron family — have reasserted US institutional presence in open-weight AI. Nine out of thirty trending models bearing a single company's name is unprecedented in Hugging Face history, reflecting both the volume and quality of NVIDIA's recent open-source push. The trend aligns with NVIDIA's broader strategy of building the software ecosystem around its hardware dominance.

Western Frontier Models Dominate Hard Benchmarks Over China and Open-Source

A comprehensive compilation of multiple hard, private, and out-of-distribution evaluations reveals that Western frontier models — from OpenAI, Anthropic, and Google — lead Chinese and open-source alternatives by a substantial margin. The gap is particularly pronounced on private OOD benchmarks where models face entirely novel problem distributions. While open-source models have closed much of the gap on public benchmarks like MMLU and HumanEval, the private evaluation landscape tells a different story: frontier labs retain a significant advantage when models are tested against genuinely unseen, adversarially designed assessment suites.

DeepSeek V3's Taste Traced to Liang Wenfeng's Personal Annotation

Industry commentary suggests that the distinctive quality and taste of DeepSeek V3 can be traced to founder Liang Wenfeng personally annotating training data. The observation highlights a deeper point about AI development: taste is not just about glamorous architecture designs — it must be demonstrated by example. That a CEO at Liang's level would personally handle data annotation underscores DeepSeek's hands-on engineering culture. The practice is seen as bullish for Meta, suggesting that data curation discipline may be a more durable moat than model architecture innovations alone.

PRODUCT

OpenAI Releases Dozens of Real-World AI Workflow Examples

OpenAI published multiple real-world case studies showing how teams use AI to automate tasks across various industries, from email management to complex data pipelines.

INDUSTRY

NVIDIA and Doosan Group Expand Physical AI and Robotics Collaboration

NVIDIA and South Korea's Doosan Group announced expanded cooperation in physical AI, robotics, and AI factory infrastructure across manufacturing sectors.

COMMUNITY

Over 25 Major Open-Source AI Models Released This Week

victormustar catalogs more than 25 notable open-weight model drops in a single week, calling it the craziest period in open-source AI history. Yann LeCun amplified the signal.

PAPERS

HuggingFace Weekly Picks: PEFT Scaling and New Architectures

Hot papers this week include PEFT scaling to million-parameter models and novel architecture research, curated by the HuggingFace team.

RELEASE

Super Gemma 4 26B Uncensored GGUF v2 Released

Community release achieves zero refusal rate with actual uncensored outputs, plus fixes for tool-call and tokenization issues in the original.

PREDICTION

Replit President Predicts AGI by 2028

Replit President Michele Catasta stated in an interview that AGI supporting vibe-coding could arrive before 2028, a notably aggressive forecast.

PAPER

Argus-Retriever: First Late-Interaction Visual Document Retriever

Combines query use with late interaction where document representation adapts to the query, enabling visual document retrieval at scale.

HISTORY

AI Model Open-Source Status: Many Classics Only Released Weights

Review shows milestone models like AlexNet and Transformer often released neither code nor weights; ResNet, GPT-2, BERT only released weights.

TREND

Agentic AI Boosts Output but Adoption Stagnates

Data suggests agentic AI significantly increases individual output, but overall organizational adoption has not grown, revealing a critical disconnect.

Product & Tools06.08

DESIGN

GPT-5.5 Design Quality Lags Behind Opus 4.8, User Comparison Shows

Users compared GPT-5.5 with Opus 4.8 on the baoyu-design skill; Opus 4.8 significantly outperforms in UI/UX generation quality, with the recommendation to pair both.

VIDEO

Google Omni Model Performs Precise Video Local Editing

Demonstration shows the Omni model performing targeted object replacement in video — changing a frog to a kitten while preserving the entire background frame perfectly.

ROBOTICS

VLA-JEPA Robot Model Launches on LeRobot Framework

VLA-JEPA learns from visual features rather than just mimicking actions, improving generalization for robotic manipulation tasks beyond the training distribution.

ANALYSIS

Market Forces Behind Declining Research Paper Output

swyx theorizes researchers realized they could walk out and raise $100M+ rather than fight marketing departments, driving a structural decline in lab publications.

ECONOMY

LLMs May Create Office Worker Surplus While Robots Remain Scarce

Commentary suggests LLMs will oversupply cognitive labor while physical robots stay expensive and rare, potentially validating population-maximizing economic theories.

HARDWARE

Custom Silicon: AGI Strategy or Funding Competition?

Analysis argues custom chip development is not purely about AGI preparation but a strategic move to compete with NVIDIA for investment and reduce eternal hardware dependence.

Briefs & Updates06.08

ROBOTICS

Reachy Mini Robot Runs Real-Time Locally

Reachy Mini achieves near-real-time response via local inference; v1.8.0 supports MCP extensions.

IMAGES

Ideogram 4.0 Partners with Lovart for New Features

Ideogram 4.0 jointly released new features with Lovart, expanding AI image generation capabilities.

PLATFORM

Replit CEO: Eliminate Distractions, Focus on Speed to Market

Replit CEO Amasad emphasizes the platform aims to strip away friction so developers focus purely on shipping and profitability.

FIX

Grok Build Stable v0.2.32 Fixes web_fetch Crash

The latest Grok Build release resolves a crash/panic during web_fetch operations.

TOOLING

Grok Build Now Passes .envrc Directly to Agent Shell

Grok Build automatically loads environment variables from .envrc into the agent shell, streamlining configuration.

FIX

Grok Build Resolves grep Timeout Issue

Elon Musk confirmed the latest Grok Build has fixed a persistent grep timeout fault.

SAFETY

Nathan Lambert on AI Safety: Much Remains Unknown

Nathan Lambert shares examples illustrating how little we control in current models, emphasizing the urgency of safety research.

OPINION

AI Makes Unique Ideas Cheap to Execute, Discovery Is the Real Challenge

Ethan Mollick notes AI drastically reduces implementation costs for novel ideas, but finding those ideas remains a major opportunity.

CREATIVE

Omni Flash Plus Dreams 3D Yields Impressive Visual Results

Simple prompts like "wrap in stark realism" achieve style transfers between Omni Flash and Dreams 3D artwork.

ROBOTICS

Human-Robot Motion Replication Tech Draws Wide Attention

CTO Robotics demo of human-robot motion mirroring sparks comparisons to giant mecha and practical engineering questions.

WORKFLOW

Claude Design First, Then Code: A Proven Development Pattern

Users share a workflow: design UI/UX with Claude Design, generate HTML+CSS+React prototypes, then develop the application from the generated structure.

COMPARISON

Deep Research Showdown: ChatGPT Leads, Gemini Excels at Search

User evaluation rates ChatGPT Deep Research best overall, Claude average, Gemini strongest in search; many use ChatGPT and Gemini together.

TOOLS

Cursor Integrates Browser and Element Annotation

Cursor adds browser preview and element annotation, effectively turning into a local design studio running Claude Design.

CREATIVE

Adobe Firefly Integrates Aleph 2 for Creative Environment

Adobe is shifting from adding individual AI models to building a complete creative environment around Firefly and Aleph 2.

STRATEGY

Grok Outlook Grim if Musk Still Sees Inference as Traditional Training

Commentary warns that treating inference capacity with a traditional training mindset will leave Grok behind, as inference capacity is now functionally training capacity.

SCIENCE

AI Set to Transform Materials Science

An impressive list of specific atomic-combination materials raises the question: how will AI reshape materials discovery in this new era?

MOBILE

Claude Code Mobile Remote Control Frustrated by Constant Permissions

Users report Claude Code's remote control on mobile requires repeated permission confirmations after planning, creating a poor experience.

HACKING

Developer Builds HAR Parser to Decrypt Claude Design Requests

To study Claude Design internals, a developer created a HAR parsing tool that decrypts binary content to reveal the underlying prompts.

DESIGN

Claude Design's 8 Golden Rules for Product Design

Eight timeless principles from Claude Design, including: "A prototype nobody clicks is just a painting" and "The best design system is the one nobody notices."

OPINION

AI Features in Productivity Tools See Low Usage Among Power Users

AI researcher Graham Neubig reports rarely using AI features in Superhuman, Linear, and Slack, preferring keyboard shortcuts for speed.

More Releases & Signals06.08

HUB

Hugging Face Post-Training and Push Pipeline Gains Traction

Users can post-train a model on Hugging Face and push directly to the Hub, simplifying deployment workflows.

VIDEO

PixVerse Originals Debuts AI-Made Sci-Fi Comedy Short

PixVerse Season 1 launches 'Mars Landing,' an AI-generated sci-fi dark comedy showcasing video generation capabilities.

EVENTS

SaaStr AI 2026: Replit CEO Shares Stage with Top Users

Replit CEO Amasad and a 0.1% power user discuss AI development practices live at SaaStr.

OPINION

AI Labs as Economic Black Holes: Absorb Capital, Emit Only Data

A provocative definition of AI labs: entities that continuously absorb capital and physical mass but only output tokens.

AUTOMATION

User Shifts from Prompting Claude to Writing Loops That Prompt It

A new working mode: writing outer-loop logic that continuously prompts Claude to decide next steps, with the human only writing the orchestration.

WORKFLOW

Click-Based AI Creation: ComfyUI, Photoshop, AE with GPT Image 2.0

A streamlined click-based AI creation pipeline combining ComfyUI, Photoshop, After Effects, and GPT Image 2.0.

MEDIA

Glif Music Video Workflow Gets 2x Speed Boost

Glif Agent optimized the music video workflow over the weekend, doubling speed while users guide the agent step by step.

TIPS

Seedance 2: Hand-Drawn Camera Motion Control for Scenes

Glif lets users draw camera motion paths on input images, combined with Seedance 2 for precise video generation control.

CRITIQUE

GPT-Image-2 Traditional Art Style Draws Criticism

Users describe GPT-Image-2's traditional art output as "dirty and flat," with visible autoregressive grain artifacts.

COMPUTE

DeepSeek 384 Cluster Enables Parallel RL Teacher Cloning

DeepSeek's 384-cluster can run parallel reinforcement learning with teacher cloning and merging; V5 report is highly anticipated.