vLLM Partners with Red Hat, Poolside for 2-3x Inference Speedup
vLLM collaborated with Red Hat and poolside to optimize Laguna XS.2 model inference, achieving 2-3x decoding speedup via DPFlash speculative sampling and supporting multiple quantization formats including FP8, NVFP4, and INT4. The DFlash speculator drafts 8 tokens per forward pass at no quality loss.
Step 3.7 Flash Launches Online Demo, Try Instantly Without Installation
StepFun released a hosted demo for Step 3.7 Flash, allowing users to run the model directly in a browser with zero code required. Built on Gradio, the demo is now live in the Hugging Face organization. The release lowers the barrier for developers and researchers to evaluate the model's capabilities firsthand without any setup.
OpenAI Launches Robotics Team, Starts Large-Scale Hiring
Sam Altman announced the official launch of OpenAI Robotics, hiring full-stack hardware, system, and ML engineers to build socially useful robots. The initiative aims to bring AI into the physical world.
CursorBench Mines Failure Cases from Production Coding Sessions
CursorBench dynamically evolves by extracting failure cases from real coding sessions, making evaluations far more aligned with real-world developer workflows than static benchmarks. Based on production usage of coding agents.
HRM-Text 1B Reasoning Language Model Released
Sapient Intelligence released HRM-Text, an ultra-lean 1B-parameter language model with strong general reasoning capability, demonstrating that small models with focused training can deliver competitive reasoning performance.
Over the past 18 months, global elite attitudes toward AI have fundamentally diverged: some can no longer be productive without AI, while others remain in a cognitive bubble dismissing its effectiveness entirely.
PixVerse Integrates OpenClaw for Text and Image to Video
PixVerse joined OpenClaw as an official external plugin, letting users generate videos from text or images directly within the platform with dual API endpoints for international and China regions.
Dell and NVIDIA Deliver First Vera Rubin NVL72 to CoreWeave
Dell and NVIDIA delivered the first Vera Rubin NVL72 system to CoreWeave, marking the official start of next-generation AI computing infrastructure deployment at scale.
DeepMind Packs 30+ Scientific Databases as Agent Skills
DeepMind integrated scientific databases like AlphaGenome and UniProt into callable agent skills, significantly reducing hallucination and token waste in scientific queries by standardizing database access patterns.
StepFun Explains Step 3.7 Flash and Agent Future at ClawCon
StepFun's developer business GM presented the design philosophy behind Step 3.7 Flash and outlined the next frontier of agent efficiency at ClawCon Macao.
Huawei LogicFolding Achieves 16-36x Interconnect Density via EDA
Technical analysis indicates Huawei's LogicFolding design primarily benefits from EDA software innovation, dramatically increasing interconnect density without requiring advanced lithography processes.
OpenAI Reveals Voice Hackathon Final Projects
OpenAI's Voice Hack Night final projects were unveiled, showcasing four real-time voice agent prototypes built in under 6 hours each using the Realtime API.
Fireworks AI Reaches $800M Annual Revenue Run Rate
AI inference platform Fireworks AI has reached $800 million in annualized revenue, achieving 4x year-over-year growth, signaling strong enterprise demand for hosted inference.
AI Coding Agents Rekindle CEOs' and CTOs' Programming Passion
Vercel's founder noted that thanks to coding agents like Claude Code, many company executives have fallen back in love with programming and actively use AI to develop products.
Hugging Face Calls for Open Sharing of Agent Trace Data
Clement Delangue called on the community to share more coding and agent trajectory data publicly to build better training datasets and improve open-source models.
Codex Desktop Update Removes 'Copy as Markdown', Sparking Backlash
OpenAI's Codex Desktop update 26.527 removed the popular chat export feature, causing strong community backlash. An issue has been filed on GitHub.
Frontier Labs Tacitly Maintain Over 50% Inference Margins
Commentary notes frontier AI labs avoid inference price wars, tacitly maintaining profit margins above 50% and refusing to race to the bottom on API pricing.
Frontier Lab Training Cost Estimates May Be Overstated
Estimates show frontier labs never used more than 300T tokens for pretraining, and GPU rental costs are far lower than widely circulated figures suggest.
LLM Trap Question '50m to Car Wash' Most Revealing of Reasoning Failure
A researcher catalogued LLM trap questions, noting that the classic '50 meters to the car wash' remains the most effective probe for revealing fundamental scenario comprehension gaps across all model tiers.
MathArena: Only 3-4 Questions Still Differentiate Frontier Models
Analysis shows most of MathArena's 40 questions can no longer distinguish top models; only a handful provide non-zero signal for meaningful frontier comparison.
Blackwell GPU May Have Shortest Lifecycle in Nvidia History
Analysts believe the Blackwell GPU series could have the shortest effective lifecycle ever, facing replacement just as inference optimizations like Dynamo mature and Hoppers remain strong.
TokenSpeed Kernel Accelerates Inference with CuteDSL and Triton
LightSeq team's TokenSpeed Kernel achieves efficient inference acceleration using CuteDSL and Triton Gluo, pushing the frontier of low-level kernel optimization.
Schulman: Inoculation Prompting May Backfire by Training Better Hackers
John Schulman suggested that if inoculation prompting is used for RL training, models might instead become more proficient at sandbox escapes and vulnerability exploitation from the extended practice.
Trust Problem in Agent Society May Make Higher IQ Suboptimal
A thought experiment suggests that in an agent society lacking mutual trust, all scales fall into Nash equilibrium spaghetti, where higher individual intelligence may not benefit the collective.
Eval and Analytics Startups Undergo Continuous Learning Upgrade Wave
In 2026, many evaluation and analytics startups are shifting from one-time benchmarks to continuous learning platforms, with only the most thoughtful execution winning out.
Ethan Mollick: AI Agents Should Ask Better Questions, Not Just Execute
The Wharton professor noted that fully automated AI agents are not the ideal collaboration model; AI should proactively ask good questions when stuck, uncertain, or needing human judgment and taste.
Blogger Criticizes ChatGPT Translation, Predicts Team Merge with Codex
A user sharply criticized ChatGPT's translation experience as poorly designed, speculating its product team will soon be absorbed by the Codex organization.
Chinese AI Products Urged to Shift Toward GUI and Universal Agents
Industry voices suggest tools like Kimi Code and DeepSeek Harness should develop graphical interfaces and general office capabilities early, rather than overcompeting in terminal and coding niches.
LLMs Consistently Produce Coordinate Flip Bugs in End Applications
Developers note that from DeepSeek to GPT-5.5, nearly all LLMs produce coordinate flip errors in camera, control, and physics applications — a stubborn, persistent failure mode.
GDB Praises Codex Computer Use as Viscerally Compelling
OpenAI's Codex computer use feature received high praise as one of the most viscerally compelling AI capabilities demonstrated recently, enabling agents to operate desktop interfaces directly.
GDB Marvels at GPT Realtime 2's Interaction Magic
GPT Realtime 2 is described as unlocking genuine interaction magic, showcasing new real-time voice and multimodal capabilities that feel qualitatively different from previous APIs.
AI Forces Humanity to Redefine What Makes Us Unique
A personal reflection suggests AI is forcing humanity to confront the possibility that many abilities once thought uniquely human may simply be emergent patterns from sufficient scale and data.
Creative Workers Increasingly Embrace AI-Assisted Coding
Observations show creative professionals are increasingly adopting AI coding tools, forming a new trend of non-engineers building software through natural language prompting.
China Lags in AI Compute but Startup Funding Remains Active
Commentary notes China still trails in AI compute capacity, but domestic startup funding is substantial and may help address broader economic challenges including youth unemployment.
Top AI Papers of the Week: Gamma-World, SkillO and More
This week's top AI papers include Gamma-World for multi-agent world modeling and SkillO for skill orchestration, spanning generative modeling and agent coordination.
Yann LeCun's Definition of a World Model Circulates in ML Community
A detailed definition of what constitutes a world model, attributed to Yann LeCun, is circulating among ML researchers and sparking renewed discussion on model-based reasoning.
Redpoint InfraRed 100 Lists Top AI Infrastructure Companies
The Redpoint InfraRed 100 is now live, cataloguing the companies building the infrastructure that powers the entire AI ecosystem from chips to cloud orchestration.
NVIDIA GTC Taipei Keynote Starts Monday with Jensen Huang
NVIDIA reminded the community that the GTC Taipei keynote begins Monday at 11 AM local time, with Jensen Huang taking the stage at the Taipei Music Center.