May 10, 2026 · Sunday

MiniCPM-o 4.5 Brings Real-Time Full-Duplex Multimodal AI to Open Source

The Omni-Flow framework enables simultaneous seeing, hearing, and speaking across modalities at just 9B parameters, outperforming models ten times its size.

MiniCPM-o 4.5 achieves open-source SOTA in its weight class, approaching Gemini 2.5 Flash on vision-language tasks.

MiniCPM-o 4.5 marks a milestone in open-weight multimodal development with the introduction of Omni-Flow, a unified streaming framework that aligns multimodal inputs and outputs along a temporal axis for real-time full-duplex interaction. Unlike traditional architectures that process modalities sequentially, the model simultaneously sees, hears, and speaks while maintaining continuous awareness of live scenes, and can proactively issue alerts or commentary based on its understanding of the environment. With 9 billion total parameters, its visual-language performance approaches that of Gemini 2.5 Flash, setting a new open-source best at this scale. Full-modal understanding surpasses Qwen3-Omni-30B-A3B, and speech generation quality exceeds comparable models while running more efficiently. Through architectural and inference optimizations, the system can operate on hardware with under 1 gigabyte of memory, making real-time omni-modal interaction accessible on consumer devices. This release signals that the gap between proprietary and open multimodal AI is narrowing rapidly.

OpenAI Releases GPT-Realtime-2 With CRM Voice Control Integration

Real-time voice models move from demos to enterprise workflows as OpenAI demonstrates CRM integration.

OpenAI has publicly demonstrated how the newly released GPT-Realtime-2 can be integrated into customer relationship management workflows to deliver voice-controlled operations. The demonstration shows a practical path from experimental voice models to production enterprise tooling, with natural language commands handling scheduling, data entry, and customer record retrieval in real time. This marks a significant step in making real-time speech AI a core component of business software stacks.

GPT-Realtime-2 brings voice commands to CRM platforms.

Tencent Hunyuan Hy3 Preview Tops OpenRouter Rankings After Free Period

Hy3 preview claimed first place in token usage, coding, and tool calling with a 15.4% market share.

Following the conclusion of its free period on OpenRouter, the Tencent Hunyuan Hy3 preview achieved a commanding lead across multiple metrics. The model ranked number one in overall token usage, coding performance, and tool calling capabilities, capturing a 15.4% market share across all providers on the platform during its two-week preview window. The model remains available on OpenRouter at competitive pricing, positioning Tencent as a serious contender in the API inference marketplace alongside Western labs.



AI Coding Assistant Reproduces 58 Papers by Schmidhuber Spanning 1989–2025

A project reproduced decades of foundational AI research using nothing but an AI coding assistant and pure NumPy.

In a striking demonstration of AI-assisted research reproducibility, a project led by Yaroslav Bulatov used an AI coding assistant to reproduce 58 papers by Jürgen Schmidhuber from 1989 through 2025. All implementations are written in pure NumPy and are designed to run on a laptop, with evaluation metrics provided alongside the original paper results for direct comparison. The project also successfully reproduced the "World Models" paper co-authored by Schmidhuber and hardmaru, including a full VAE and RNN world model implementation within a toy environment. This work offers a compelling argument that AI coding tools can dramatically lower the barrier to replicating and understanding historic machine learning research, turning decades of theoretical contributions into runnable code that students and practitioners can study directly.

A VAE + RNN world model implementation reproduced entirely via AI coding assistance.



Tesla AI Vision Predicts Crashes, Deploys Airbags Before Impact

Pre-crash airbag deployment comes standard on all new Tesla vehicles at no additional cost.

Elon Musk announced that Tesla's AI vision system can now predict imminent collisions and deploy airbags before impact occurs, significantly reducing the risk of injury or death. The feature is powered by the same computer vision neural networks that drive Full Self-Driving, which analyze sensor data in real time to identify crash trajectories milliseconds before they happen. By triggering restraints ahead of the moment of collision, the system gives occupants an additional safety margin that passive systems cannot provide. The capability is included at no extra charge on all new Tesla vehicles.

Higgsfield Launches AI Content Factory With Claude, MCP, and Viral Predictor

An automated pipeline that replicates top video formats, scores outputs, and compounds overnight.

Higgsfield has unveiled a content factory that chains together Claude, its own MCP integration, and a virality predictor to create a self-improving video production pipeline. Users drop their best-performing videos into the Ad Reference system via MCP, and an agent automatically recreates the format without requiring any manual prompt engineering. Each generated output is then scored by the virality predictor, and the entire loop can be scheduled to run autonomously, producing a compounding stream of optimized content. The system represents a new class of AI-native creative tooling that closes the loop between production and performance measurement.


● Briefs & Commentary May 10 · Global
Architecture

AI Products Converge on Markdown for Logic, HTML for Display

A growing industry consensus separates AI-native products into clean Markdown for logical storage and memory, and rich HTML for high-density interaction and presentation.

Inference

DeepSeek Claims MLX Kernels Beat Human-Made Ones

At 10 tokens per second for FP16 and higher for quantized variants, DeepSeek's auto-generated MLX kernels appear competitive, though the project remains messy and difficult to verify independently.

Safety

Matformer-Style Tricks Could Resolve AI Safety vs. Openness

A proposal suggests pretraining a large mixture-of-experts model and extracting a smaller subset that is intelligent but ignorant of dangerous knowledge in bio and cyber domains.

Training Efficiency

Baidu Claims "Multi-Dimensional Elastic Pre-Training" Gains

Skeptics argue Baidu's reported 6% efficiency improvement stems from stripping down an oversized model rather than from the elastic training technique itself, which may still hold technical merit.

Infrastructure

DeepSeek Cache Hit Statistics Show Near-Perfect Reuse

Cache hit rates at nearly 100% suggest DeepSeek has a strictly optimal context reuse implementation. Some observers believe the reuse window likely spans 24 to 48 hours.

Engineering

swyx Recommends "Just in Case" Learning for AI Engineers

A new tutorial compared in significance to "Kubernetes The Hard Way" is being recommended as essential reading for all AI engineers, marking a rare endorsement of deep fundamentals over just-in-time learning.