Grok Integrates Hermes Agent, Gains Ability to Search X Posts
xAI has integrated Grok into Nous Research's open-source, self-improving agent Hermes Agent. X Premium subscribers can now invoke Grok directly from within Hermes Agent, which also gained the capability to search posts on X — bridging AI assistants with real-time social platform information. The integration marks a significant step toward making Grok's capabilities accessible beyond the xAI ecosystem.
No Verifier Needed: Eight LLM Calls Boost CodeForces Elo by 405 Points
A verifier-free method achieves dramatic competitive programming results in roughly 27 minutes using only sequential LLM reasoning rounds.
Research published this week demonstrates that competitive programming scores can be boosted by 405 Elo points in approximately 27 minutes of wall-clock time using just eight consecutive LLM calls — crucially, without requiring any verifier or outcome-checking module. The method works by chaining LLM responses sequentially, with each call refining the output of the previous one.
If this approach proves effective with cheaper, smaller models, it could democratize elite-level coding assistance and reshape how developers approach algorithmic problem-solving. The researchers plan to test the method on CritPt and harder MathArena benchmarks to gauge its generalizability beyond CodeForces problems.
NVIDIA Unleashes SANA-WM: A 2.6B Open-Source World Model
NVIDIA released SANA-WM, a blazing-fast 2.6-billion-parameter open-source world model, with major implications for robotics simulation, video generation, and embodied AI research. The model's speed and efficiency make it a strong candidate for real-time physical world reasoning.
AI Chip Maker Cerebras Successfully Goes Public
Cerebras, the AI chip company known for its wafer-scale processors, completed its initial public offering this week. The listing attracted widespread attention from both the semiconductor and AI investment communities, signaling strong market appetite for dedicated AI hardware plays.
Two Barriers to True AI Takeoff: Autonomous Research and Continual Learning
Ethan Mollick identifies robust recursive self-improvement — AI acting as an independent researcher rather than merely amplifying human effort — and continual learning as the two critical missing pieces for genuine AI transformation. Either would represent a major trajectory shift.
Yann LeCun Predicts General Hierarchical World Models Within 12–18 Months
Yann LeCun forecasts that a general method for training hierarchical world models will emerge within the next year to year and a half. These models would enable AI systems to learn abstract representations of the physical world through observation and interaction, a capability currently missing from most architectures.
"FSD V14.3.3 is a banger."
— Elon Musk
TERMS-Bench Evaluates LLM Agents in Economic Negotiation
A new three-tier benchmark tests LLM agents in realistic economic negotiation scenarios without relying on LLM-based judges. TERMS-Bench challenges models on real-world deal-making across multiple complexity levels.
New 9B Parameter Model Built for Tool Calling and Agent Coding
Kyle Hessling released a specialized 9-billion-parameter model trained specifically for tool-calling and agentic coding workflows, targeting practical developer use cases.
Apple MLR Proposes Regularized Trajectory Models
Apple's machine learning research team introduces regularized trajectory models, addressing the challenge of building fast generative models within a principled likelihood-based framework.
Tencent Free Local Pixel3D Achieves Pixel-Perfect 3D Generation
Tencent's new Pixal3D project maps every pixel directly into 3D space, achieving pixel-perfect 3D generation without requiring multi-view data or complex post-processing pipelines.
Models Vary Dramatically in Recovery from Poor Initial Agent Output
Experiments show large differences in how models respond when starting from the worst agent outputs. V4 is generally worse but recovers the most, raising questions about in-distribution versus out-of-distribution robustness.
DeepSeek Founder Wenfeng Maintains Fully Domestic Strategy
DeepSeek founder Wenfeng has pursued technological independence since the company's founding in 2023, a strategy reinforced after V2 in 2024 and now growing more confident — a path distinct from the AGI-oriented approaches of Tencent, ByteDance, and Meta.
V4.1 and V4.1-Vision Model Release Anticipated
The next-generation V4.1 series is expected to arrive shortly, with predictions calling the launch a major moment for the model ecosystem.
Mistral Surpasses Kimi in Latest Benchmarks
Commentary indicates that Mistral has closed the performance gap with Kimi, marking a notable shift in the competitive landscape among frontier model providers.
How Should the AI Takeoff Curve Be Measured?
An ongoing discussion asks what metrics best capture AI takeoff — GDP growth rate, frontier lab revenues, percentage of labor automated, or R&D acceleration versus cognitive offloading. The question of measurement remains unresolved.
Grok Build Iteration Speed Lightning Fast
Elon Musk Announces Grok Upgrade
Codex Enables Cross-Device Development
Newer Models Would Outperform GPT-4 in Experiments
Grok Imagine Image Generation Debuts
Elon Musk shared an image generated by Grok Imagine, showcasing the model's AI image generation capabilities in what appears to be a new creative feature rollout.
Luma Hosts Event on AI's Shift from Coding to Creation
On May 21 in Mountain View, Luma holds an event with EazoAI, Agora, and SEAMATE to explore the question: when AI handles the syntax, what is a developer's actual value?
Kling AI Reveals Cannes Conference Agenda
Kling AI will host a conference at Cannes centered around three collaborative film projects — Born of the Tide, House of David, and RAPHAEL — exploring how AI redefines cinematic creation from bold ideas to the big screen.
xAI Algorithm Analysis Allegedly Reveals Hidden Content
A user claims to have spent $500 using Claude to analyze an xAI algorithm release, uncovering previously unnoticed content within the recommendation system's code.
Anthropic Official Skill Building Guide Now Bilingual
A community member translated Anthropic's official skill building guide into a bilingual Chinese-English version, making the documentation accessible to a broader developer audience.
PPT Skill + Codex + Heygen HyperFrames Combination Demo
A user demonstrates combining multiple AI tools to generate an explanatory video with effects, previewable directly in chat. The pipeline integrates document generation, code execution, and video synthesis.
GPT-5.5 Pro Faces the Hardest Humor Pairing Challenge
Ethan Mollick tested GPT-5.5 Pro's ability to generate funny word pairs by applying techniques from a linguistics paper. The model produced creative combinations including scrotum snorkel, tuba subpoena, waffle coffin, and diarrhea tiara.
Grok Build Runtime Can Reach 10 Minutes
Users find that the /implement command can make Grok Build run continuously for extended periods, with one user reporting their first 10-minute build session.
Grok Build Excels at Sub-Agents and Persona Handling
Users praise Grok Build's performance in sub-agent management and role-playing, noting that most people still underutilize the model's multi-agent orchestration capabilities.
Figure Robot's Autonomous Demo Draws Skepticism
Critics argue Figure's non-stop autonomous operation demo may merely showcase hardware reliability on a trivial sorting task rather than demonstrating advanced autonomous intelligence.
Build Projects on Phone Using Codex in ChatGPT App
Users can now build applications directly on mobile devices via Codex integrated in the ChatGPT app, enabling on-the-go development workflows.
Codex Brings Random Query Fun to Work
Users praise Codex for easily finding previously viewed spreadsheets and documents, describing the experience as more efficient and fun than manual search.
Jerry Liu on Context Engineering for Financial AI Agents
LlamaIndex founder Jerry Liu highlights that financial AI agents depend on high-quality document context engineering. He categorizes these systems into several types, each requiring distinct approaches to information retrieval and structuring.
Nat Lambert Urges Independent Thinking in AI
After leaving San Francisco, Nat Lambert gained space to cultivate his own beliefs about AI. He argues that monoculture in the AI community only helps incumbents win, and that the field needs more people zagging rather than following the herd.
AI Singularity in von Neumann's Sense: Unknowable Transformation
Ethan Mollick argues that by von Neumann's original definition — the point beyond which human affairs cannot continue as we know them — the singularity concept rings true. Like the Industrial Revolution, the transformation is by definition unpredictable in advance.
Data Centers Boost Economy but Also Raise Electricity and Housing Prices
New research shows data centers create economic activity in related sectors and raise county-level income aggregates. However, they also increase electricity prices and are associated with higher housing costs in surrounding areas.
Late Interaction Model Community Active, Tooling Mature
The community considers late interaction models a smart choice, with ample tooling and support available.
Grok Surpasses 126 Million Posts on X
Data shows Grok's posting volume on X is growing rapidly as millions of users engage with the platform.
AI.Dot.Engineer Event Coming to India
Swyx previews that the AI Engineer conference series will expand to India in the near future.
Reinforcement Learning and RLMS Keynote Delivered
Jack Minong gave a keynote on reinforcement learning and RLMS at the AI Engineer Singapore conference.
First Project Built with Grok Build Goes Live
A user shares a project actually built using Grok Build, calling it useful for anyone working with AI agents.
Darren Aronofsky: AI Technology Liberates Directors
At the Cannes AI Summit, director Darren Aronofsky said AI can liberate directors, citing Orson Welles as an example.
AI-Generated Odyssey Film Version Sparks Discussion
Ethan Mollick used AI to create a complete version of The Odyssey, claiming it may be the most definitive adaptation since Homer.
Runway Seedance 2 Generates AI Short Film
CoffeeVectors showcases an AI-generated film created using Runway's Seedance 2 generative video platform.