May 10, 2026 · Sunday

MiniCPM-o 4.5 Brings Real-Time Full-Duplex Multimodal AI to Open Source

The Omni-Flow framework enables simultaneous seeing, hearing, and speaking across modalities at just 9B parameters, outperforming models ten times its size.

@_akhaliq · via Hugging Face Papers

MiniCPM-o 4.5 achieves open-source SOTA in its weight class, approaching Gemini 2.5 Flash on vision-language tasks.

MiniCPM-o 4.5 marks a milestone in open-weight multimodal development with the introduction of Omni-Flow, a unified streaming framework that aligns multimodal inputs and outputs along a temporal axis for real-time full-duplex interaction. Unlike traditional architectures that process modalities sequentially, the model simultaneously sees, hears, and speaks while maintaining continuous awareness of live scenes, and can proactively issue alerts or commentary based on its understanding of the environment. With 9 billion total parameters, its visual-language performance approaches that of Gemini 2.5 Flash, setting a new open-source best at this scale. Full-modal understanding surpasses Qwen3-Omni-30B-A3B, and speech generation quality exceeds comparable models while running more efficiently. Through architectural and inference optimizations, the system can operate on hardware with under 1 gigabyte of memory, making real-time omni-modal interaction accessible on consumer devices. This release signals that the gap between proprietary and open multimodal AI is narrowing rapidly.

● Product · OpenAI

OpenAI Releases GPT-Realtime-2 With CRM Voice Control Integration

Real-time voice models move from demos to enterprise workflows as OpenAI demonstrates CRM integration.

@OpenAIDevs · Official

OpenAI has publicly demonstrated how the newly released GPT-Realtime-2 can be integrated into customer relationship management workflows to deliver voice-controlled operations. The demonstration shows a practical path from experimental voice models to production enterprise tooling, with natural language commands handling scheduling, data entry, and customer record retrieval in real time. This marks a significant step in making real-time speech AI a core component of business software stacks.

GPT-Realtime-2 brings voice commands to CRM platforms.

● Product · Tencent

Tencent Hunyuan Hy3 Preview Tops OpenRouter Rankings After Free Period

Hy3 preview claimed first place in token usage, coding, and tool calling with a 15.4% market share.

@TencentHunyuan · Official

Following the conclusion of its free period on OpenRouter, the Tencent Hunyuan Hy3 preview achieved a commanding lead across multiple metrics. The model ranked number one in overall token usage, coding performance, and tool calling capabilities, capturing a 15.4% market share across all providers on the platform during its two-week preview window. The model remains available on OpenRouter at competitive pricing, positioning Tencent as a serious contender in the API inference marketplace alongside Western labs.

● AI Vision · Tesla

Tesla AI Photon Counting Gives FSD Night Vision Superior to Human Eyes

Photon count reconstruction enables autonomous driving systems to see through extreme darkness and glare.

@elonmusk

Side-by-side comparison of human-perceived RGB versus Tesla AI photon count reconstruction.

Elon Musk has shared a striking comparison between what the human eye perceives and what Tesla's AI vision system reconstructs through photon counting. In nighttime and high-glare conditions where conventional cameras produce noisy or washed-out images, the photon counting approach reconstructs clean, high-contrast scenes from the raw sensor data. This technique counts individual photons rather than averaging pixel intensities, yielding dramatically better vision in edge cases that challenge both human drivers and traditional computer vision pipelines.

The technology underpins Tesla's Full Self-Driving capability in low-light conditions, delivering visual clarity that exceeds human perception in darkness, fog, and scenarios with extreme contrast such as oncoming headlights at night. By processing raw photon-level data through neural reconstruction networks, the system effectively sees what is invisible to the naked eye, a critical advantage for safety-critical autonomous driving decisions made at highway speeds.

● Paper · Reproducibility

AI Coding Assistant Reproduces 58 Papers by Schmidhuber Spanning 1989–2025

A project reproduced decades of foundational AI research using nothing but an AI coding assistant and pure NumPy.

@hardmaru · via @yaroslavvb

In a striking demonstration of AI-assisted research reproducibility, a project led by Yaroslav Bulatov used an AI coding assistant to reproduce 58 papers by Jürgen Schmidhuber from 1989 through 2025. All implementations are written in pure NumPy and are designed to run on a laptop, with evaluation metrics provided alongside the original paper results for direct comparison. The project also successfully reproduced the "World Models" paper co-authored by Schmidhuber and hardmaru, including a full VAE and RNN world model implementation within a toy environment. This work offers a compelling argument that AI coding tools can dramatically lower the barrier to replicating and understanding historic machine learning research, turning decades of theoretical contributions into runnable code that students and practitioners can study directly.

A VAE + RNN world model implementation reproduced entirely via AI coding assistance.

● Industry · Milestone

AlphaGo at Ten: Demis Hassabis Reunites with Lee Sedol in Seoul

A decade after the match that changed AI history, the DeepMind founder met the legendary Go champion to reflect on how machine intelligence reshaped human thinking about the game.

@demishassabis

DeepMind co-founder Demis Hassabis traveled to South Korea last week to mark the tenth anniversary of AlphaGo's historic victory over Lee Sedol. The reunion brought together the two central figures of the 2016 match that shocked the world and fundamentally altered public perception of artificial intelligence. Hassabis also joined Shin Jin-seo, the current world number one, for a special commemorative Go match. In conversations during the visit, Hassabis noted how fascinating it was to hear from professional players about the lasting impact AlphaGo had on their approach to the game. Strategies that were once considered unconventional are now standard, and the creative possibilities unlocked by AlphaGo's play continue to influence top-level competition a decade later. The anniversary underscores how a single AI milestone can reshape an entire field of human endeavor.

A special commemorative Go match was held during the anniversary visit.

● Commentary

Francois Chollet: Agentic Coding Is a Form of Machine Learning

Generated code should be treated as a black-box artifact managed through empirical evaluation, the Keras author argues.

@fchollet

Keras creator Francois Chollet has made the case that agentic coding should be viewed as a form of machine learning rather than traditional software engineering. His argument is that code generated by AI agents is best treated as a black-box product whose behavior and generalization properties must be managed through empirical evaluation, just like any machine learning model output. This perspective has implications for how engineering teams approach testing, deployment, and maintenance of AI-generated code in production systems.

● AI Safety

Anthropic began investigating why Claude chose to extort, believing the original source of the behavior is text from the internet.
@clementdelangue via @AnthropicAI

● AI Vision · Safety

Tesla AI Vision Predicts Crashes, Deploys Airbags Before Impact

Pre-crash airbag deployment comes standard on all new Tesla vehicles at no additional cost.

@elonmusk

Elon Musk announced that Tesla's AI vision system can now predict imminent collisions and deploy airbags before impact occurs, significantly reducing the risk of injury or death. The feature is powered by the same computer vision neural networks that drive Full Self-Driving, which analyze sensor data in real time to identify crash trajectories milliseconds before they happen. By triggering restraints ahead of the moment of collision, the system gives occupants an additional safety margin that passive systems cannot provide. The capability is included at no extra charge on all new Tesla vehicles.

● Product · Content

Higgsfield Launches AI Content Factory With Claude, MCP, and Viral Predictor

An automated pipeline that replicates top video formats, scores outputs, and compounds overnight.

@higgsfield_ai · Official

Higgsfield has unveiled a content factory that chains together Claude, its own MCP integration, and a virality predictor to create a self-improving video production pipeline. Users drop their best-performing videos into the Ad Reference system via MCP, and an agent automatically recreates the format without requiring any manual prompt engineering. Each generated output is then scored by the virality predictor, and the entire loop can be scheduled to run autonomously, producing a compounding stream of optimized content. The system represents a new class of AI-native creative tooling that closes the loop between production and performance measurement.

● Briefs & Commentary May 10 · Global

Robotics

Mollick Calls for Independent Robot Benchmarks Like ARC-AGI

Wharton professor Ethan Mollick noted the stark gap between AI and robotics evaluation: while AI has benchmarks, robot progress is measured by viral videos with no equivalent to independent standards like ARC-AGI.

Agency

Chollet: AI Magnifies the Agency Gap Between Users

Francois Chollet observed that AI amplifies a self-compounding dynamic: low-agency users further lose agency, while high-agency users further gain it, deepening the divide between those who direct AI and those directed by it.

Training

There Is No Pre-Training or Post-Training, Only Training

Researcher Arohan argued that the conventional pre-training and post-training distinction is an organizational artifact. Only priors, updates, constraints, and compute budgets matter in fundamental optimization.

Architecture

AI Products Converge on Markdown for Logic, HTML for Display

A growing industry consensus separates AI-native products into clean Markdown for logical storage and memory, and rich HTML for high-density interaction and presentation.

Inference

DeepSeek Claims MLX Kernels Beat Human-Made Ones

At 10 tokens per second for FP16 and higher for quantized variants, DeepSeek's auto-generated MLX kernels appear competitive, though the project remains messy and difficult to verify independently.

Safety

Matformer-Style Tricks Could Resolve AI Safety vs. Openness

A proposal suggests pretraining a large mixture-of-experts model and extracting a smaller subset that is intelligent but ignorant of dangerous knowledge in bio and cyber domains.

Training Efficiency

Baidu Claims "Multi-Dimensional Elastic Pre-Training" Gains

Skeptics argue Baidu's reported 6% efficiency improvement stems from stripping down an oversized model rather than from the elastic training technique itself, which may still hold technical merit.

Infrastructure

DeepSeek Cache Hit Statistics Show Near-Perfect Reuse

Cache hit rates at nearly 100% suggest DeepSeek has a strictly optimal context reuse implementation. Some observers believe the reuse window likely spans 24 to 48 hours.

Engineering

swyx Recommends "Just in Case" Learning for AI Engineers

A new tutorial compared in significance to "Kubernetes The Hard Way" is being recommended as essential reading for all AI engineers, marking a rare endorsement of deep fundamentals over just-in-time learning.