May 18, 2026 · Monday



No Verifier Needed: Eight LLM Calls Boost CodeForces Elo by 405 Points

A verifier-free method achieves dramatic competitive programming results in roughly 27 minutes using only sequential LLM reasoning rounds.

The technique achieves +405 CodeForces Elo across eight sequential LLM call rounds — without any verifier module.

Research published this week demonstrates that competitive programming scores can be boosted by 405 Elo points in approximately 27 minutes of wall-clock time using just eight consecutive LLM calls — crucially, without requiring any verifier or outcome-checking module. The method works by chaining LLM responses sequentially, with each call refining the output of the previous one.

If this approach proves effective with cheaper, smaller models, it could democratize elite-level coding assistance and reshape how developers approach algorithmic problem-solving. The researchers plan to test the method on CritPt and harder MathArena benchmarks to gauge its generalizability beyond CodeForces problems.

"FSD V14.3.3 is a banger."

— Elon Musk

TERMS-Bench Evaluates LLM Agents in Economic Negotiation

A new three-tier benchmark tests LLM agents in realistic economic negotiation scenarios without relying on LLM-based judges. TERMS-Bench challenges models on real-world deal-making across multiple complexity levels.

New 9B Parameter Model Built for Tool Calling and Agent Coding

Kyle Hessling released a specialized 9-billion-parameter model trained specifically for tool-calling and agentic coding workflows, targeting practical developer use cases.

Apple MLR Proposes Regularized Trajectory Models

Apple's machine learning research team introduces regularized trajectory models, addressing the challenge of building fast generative models within a principled likelihood-based framework.

Tencent Free Local Pixel3D Achieves Pixel-Perfect 3D Generation

Tencent's new Pixal3D project maps every pixel directly into 3D space, achieving pixel-perfect 3D generation without requiring multi-view data or complex post-processing pipelines.

Models Vary Dramatically in Recovery from Poor Initial Agent Output

Experiments show large differences in how models respond when starting from the worst agent outputs. V4 is generally worse but recovers the most, raising questions about in-distribution versus out-of-distribution robustness.

DeepSeek Founder Wenfeng Maintains Fully Domestic Strategy

DeepSeek founder Wenfeng has pursued technological independence since the company's founding in 2023, a strategy reinforced after V2 in 2024 and now growing more confident — a path distinct from the AGI-oriented approaches of Tencent, ByteDance, and Meta.

V4.1 and V4.1-Vision Model Release Anticipated

The next-generation V4.1 series is expected to arrive shortly, with predictions calling the launch a major moment for the model ecosystem.

Mistral Surpasses Kimi in Latest Benchmarks

Commentary indicates that Mistral has closed the performance gap with Kimi, marking a notable shift in the competitive landscape among frontier model providers.

How Should the AI Takeoff Curve Be Measured?

An ongoing discussion asks what metrics best capture AI takeoff — GDP growth rate, frontier lab revenues, percentage of labor automated, or R&D acceleration versus cognitive offloading. The question of measurement remains unresolved.



Products & Demos05.18
GROK

Grok Imagine Image Generation Debuts

Elon Musk shared an image generated by Grok Imagine, showcasing the model's AI image generation capabilities in what appears to be a new creative feature rollout.

LUMA

Luma Hosts Event on AI's Shift from Coding to Creation

On May 21 in Mountain View, Luma holds an event with EazoAI, Agora, and SEAMATE to explore the question: when AI handles the syntax, what is a developer's actual value?

KLING AI

Kling AI Reveals Cannes Conference Agenda

Kling AI will host a conference at Cannes centered around three collaborative film projects — Born of the Tide, House of David, and RAPHAEL — exploring how AI redefines cinematic creation from bold ideas to the big screen.

XAI

xAI Algorithm Analysis Allegedly Reveals Hidden Content

A user claims to have spent $500 using Claude to analyze an xAI algorithm release, uncovering previously unnoticed content within the recommendation system's code.

ANTHROPIC

Anthropic Official Skill Building Guide Now Bilingual

A community member translated Anthropic's official skill building guide into a bilingual Chinese-English version, making the documentation accessible to a broader developer audience.

DEMO

PPT Skill + Codex + Heygen HyperFrames Combination Demo

A user demonstrates combining multiple AI tools to generate an explanatory video with effects, previewable directly in chat. The pipeline integrates document generation, code execution, and video synthesis.

GPT-5.5

GPT-5.5 Pro Faces the Hardest Humor Pairing Challenge

Ethan Mollick tested GPT-5.5 Pro's ability to generate funny word pairs by applying techniques from a linguistics paper. The model produced creative combinations including scrotum snorkel, tuba subpoena, waffle coffin, and diarrhea tiara.

GROK

Grok Build Runtime Can Reach 10 Minutes

Users find that the /implement command can make Grok Build run continuously for extended periods, with one user reporting their first 10-minute build session.

GROK

Grok Build Excels at Sub-Agents and Persona Handling

Users praise Grok Build's performance in sub-agent management and role-playing, noting that most people still underutilize the model's multi-agent orchestration capabilities.

ROBOTICS

Figure Robot's Autonomous Demo Draws Skepticism

Critics argue Figure's non-stop autonomous operation demo may merely showcase hardware reliability on a trivial sorting task rather than demonstrating advanced autonomous intelligence.

CODEX

Build Projects on Phone Using Codex in ChatGPT App

Users can now build applications directly on mobile devices via Codex integrated in the ChatGPT app, enabling on-the-go development workflows.

CODEX

Codex Brings Random Query Fun to Work

Users praise Codex for easily finding previously viewed spreadsheets and documents, describing the experience as more efficient and fun than manual search.



Industry Watch05.18

© 2026 FAV0 · AI Daily · Editorial page assembled by automated systems