30B-A3B Reasoning Model Reaches Gold Medal Level Across Physics and Math Olympiads
Co-first authors have released a 30B-parameter reasoning model with 3B active parameters that achieves gold medal level results on both physics and mathematics Olympiad evaluations. The compact architecture demonstrates that reasoning capability does not require massive parameter counts, challenging the scale-above-all assumption in current model development.
Raschka Publishes Visual Tour of LLM Architecture Advances: KV Sharing to Compressed Attention
Sebastian Raschka published a comprehensive blog article analyzing recent LLM architecture innovations for long-context efficiency. Covering Gemma 4's cross-layer KV sharing and per-layer embeddings, Laguna XS.2's layer-wise attention budget allocation, ZAYA1-8B's compressed convolutional attention, and DeepSeek V4's multi-head compression, the piece maps the emerging design patterns that reduce KV cache size and memory bandwidth to enable longer reasoning contexts.
China Telecom Launches Token Phone Plan: 1 Yuan Buys 250K Tokens
The carrier declares Token services a new core business line.
China Telecom has integrated AI token purchases directly into phone bills, offering 250,000 tokens for 1 yuan across more than 30 major models including text and multimodal. The company plans to make Token services its operational mainline going forward, turning what was a developer-centric abstraction into a utility as ordinary as voice minutes or data.
Codex Keyboard Shortcuts Now Customizable
OpenAI updated Codex based on user feedback, allowing users to customize keyboard shortcuts from settings to match their personal workflow instead of adapting to defaults.
Singapore Foreign Minister Uses NanoClaw on Raspberry Pi for Diplomatic Management
Foreign Minister Vivian Balakrishnan publicly shared his AI workflow using NanoClaw on a Raspberry Pi for managing diplomatic and parliamentary affairs. The stack hacks around WhatsApp integration and uses graph memory on SQLite.
Replit AI Agent Helps Designer Ship Nearly Daily for 18 Months
A designer with no coding background has been using Replit's AI Agent to ship products almost daily for a year and a half, showcasing how low-code AI tools are lowering the barrier from idea to production deployment.
Anthropic and ZAI Models Lead Slides Arena Soft-Verification Tests
In slide generation arena results, Anthropic and ZAI models performed best on soft verification benchmarks.
TOKENSPEED_MLA Integrated into vLLM Blackwell Backend for DeepSeek-R1 and Kimi K2.5
Lightseek announced TOKENSPEED_MLA has been integrated into vLLM, optimizing inference on Blackwell GPUs.
Grok CLI Gains Vercel Plugin for Cloud Deployment Superpowers
Grok CLI now supports Plugins and Skills. Installing the Vercel Plugin gives Grok seamless cloud deployment on Vercel from the command line.
Vercel Protects Agent Deployments Behind SSO for Production Environments
Vercel enables SSO protection via Okta for AI-generated app deployments from v0, Codex, Claude and others, creating a secure intranet for agent-built applications.
Open Model Roundup: Gemma 4, DeepSeek V4, Kimi K2.6, MiMo 2.5 and More
Interconnects.ai released its 21st newsletter covering the latest wave of open model releases.
US CAISI and Epoch AI Diverge in Assessments of Open vs. Closed Model Gap
Two leading AI research organizations now disagree on how wide the performance gap between open and closed models really is.
Codex Can Control Multiple Computers Across Devices in a Single Project
op7418 discovered Codex can switch control across multiple computers from the ChatGPT app without switching devices.
Codex Side Chat System Prompt Revealed: Lightweight Q&A Without Disturbing Main Thread
Dotey shared the system prompt behind Codex's side chat feature, designed for non-disruptive exploration.
Key to AI Long Tasks: Small-Stage Planning and Verification at Each Step
Dotey argues that breaking tasks into small phases with explicit verification methods such as unit tests is essential for keeping AI productive over extended sessions.
If you're not obsessed with the research problem you're working on, for its own sake, you're unlikely to succeed. Intrinsic motivation is far more powerful than external rewards.
François Chollet
Tokens Are Rapidly Becoming the Universal Input for Solving Problems
Sam Altman observed that tokens are replacing traditional structured inputs across an expanding range of problem domains.
ChatGPT Plus Now Available Nationwide Across Malta
OpenAI announced that ChatGPT Plus subscription service is now fully accessible across the entire country of Malta.
Using Codex from the ChatGPT App Is a Freeing Experience
Altman remarked that Codex on mobile makes you realize how tethered you normally are to your desktop computer.
Using GPT for Defensive Security
Altman highlighted the growing role of GPT and LLMs in defensive cybersecurity applications.
Codex Is in a Category of Its Own: Agentic Excel on Mac
Altman described Codex as being in a distinct product category, calling it agentic Excel on Mac.
Codex for Improving Computational Complexity
Altman shared Codex's utility for analyzing and improving algorithmic computational complexity.
KempeLab Chief to Leave Meta FAIR After Leading LLM Reasoning Research
The head of KempeLab, which focused on LLM reasoning research at Meta FAIR, announced their departure after nearly two years.
New Benchmark Tests Models' Ability to Pinpoint Reasoning Errors Mid-Trace
A regression task challenges models to identify where a long reasoning trace first goes wrong, testing fine-grained error detection.
On-Policy RL and Distillation Found to Rely on Labeled Final-Answer Data
Research shows on-policy reinforcement learning and distillation algorithms typically depend on privileged labeled answers and process rewards.
Neuralink Helps Paralyzed Patient Regain Ability to Paint
Neuralink shared the story of a patient paralyzed in a car accident who regained painting ability through a brain-computer interface after surgery.
Claude Resets Five-Hour and Weekly Usage Limits for the Weekend
Claude users noted the reset of five-hour and weekly quota limits, renewing access for weekend experimentation.
Compressed 120B Models Shown to Retain Substantial General Capability
Research confirms that compressed large models at 120B parameter scale preserve strong general performance, with knowledge retention being an active test frontier.