May 23, 2026 · Saturday

“After some mathematical rewrite, turns out all of transformer is a series of GEMM + epilogue. Given a few optimized primitives, LLMs — and novice humans — can write speed-of-light kernels for all transformer ops.”

— Tri Dao, creator of FlashAttention
Ethan Mollick demonstrates Gemini Omni’s native video editing capabilities.

Gemini Omni Proves Full Multimodality with Native Video Editing

Ethan Mollick highlights a key differentiator for Gemini Omni that many overlook: it is fully multimodal, meaning it can natively edit video rather than merely generating it. Taking the famous 1896 film of a train arriving at a station, he demonstrated transformations into a bullet train, a LEGO version, and additions of a time traveler, a centipede, and muppets — all while preserving the original motion and spatial coherence. This native editing capability sets Omni apart from generation-only video models and points toward a future where AI modifies existing media with surgical precision rather than starting from scratch.

Product & Research RoundupMay 23
Model Releases & BenchmarksMay 23
OPEN-SOURCE

llama.cpp Gets Full WebGPU Backend

llama.cpp and ggml now support a complete WebGPU backend, enabling large language models to run directly in the browser with GPU acceleration.

RESEARCH

Allen AI’s ArtifactLinker Predicts Model Benchmarks

ArtifactLinker automatically predicts which benchmarks a model will perform well on, addressing the problem that most models are only evaluated on a fraction of available tests.

SPEED

GLM-5.1-HighSpeed Hits 400 Tokens per Second

GLM-5.1-HighSpeed achieves 400 tokens per second on flagship-tier LLM API, setting a new speed benchmark without trading off model size for throughput.

MODEL

Qwen3.7-Max Launches with 1M Context for the Agent Era

Together AI introduces Qwen3.7-Max, Alibaba Qwen’s flagship model designed for the agent era, featuring 1-million-token context and leading benchmark performance.

BENCHMARK

ARC-AGI-3 Sees First Meaningful Score Jump

tufalabs doubled their score from 0.68% to 1.17% in the ARC-AGI-3 competition, marking the first significant improvement in this reasoning benchmark.

TRAINING

Adaption AutoScientist Reaches Frontier Model in Two Days

Adaption’s AutoScientist tool lets users train a frontier-level model within two days, with free compute resources offered throughout the next month.

TOOLING

Kakuna Automates Code Hardening via Subagent Checklists

Kakuna uses skill checklists with parallel sub-agents to harden codebases automatically, returning an audited version with all the boring work done.

CAPABILITIES

Modern LLMs Solve 100-Digit Multiplication Without Tools

teortaxesTex demonstrates that modern language models can perform 100-digit multiplication through chain-of-thought scaling alone, challenging the “embers of autoregression” thesis.

DATA

20 Billion Web Pages Now Queryable on Hugging Face via SQL

CommonCrawl’s April 2026 crawl data and URL index are now available on Hugging Face, enabling SQL-based analysis of over two billion web pages with zero download required.

PLATFORM

Xiaohongshu Adds AI Skill Upload Feature

Xiaohongshu now enables users to upload custom AI skills directly to the platform, opening a new distribution channel for AI capabilities within the popular social commerce app.

DEEPMIND

Project Genie Turns Google Street View into Interactive Worlds

Real U.S. locations transformed into interactive virtual worlds.

Google DeepMind’s Project Genie now integrates with Google Maps Street View, allowing users to take real U.S. locations and transform them into new, interactive virtual worlds through generative AI.

PRODUCT

Claude Pro Gets Auto Mode with Sonnet 4.6 and Opus 4.7

Claude Devs extends auto mode to Pro plan subscribers and adds support for Sonnet 4.6 alongside Opus 4.7, allowing users to press Shift+Tab and let Claude run autonomously.

PLATFORM

Microsoft Foundry Partners with Hugging Face on Open-Source Image Models

Three open-source image models are now available on Microsoft Foundry, bringing developers the largest catalog for AI innovation through the collaboration with Hugging Face.

PLATFORM

Cohere Command A+ Now Available as Managed Compute on Microsoft Foundry

Cohere’s latest open-source model Command A+ is now offered as a managed compute service on Microsoft Foundry, expanding enterprise deployment options.

RETAIL

Zalando Achieves 48-Hour 3D Scan to Storefront with NVIDIA AI

Zalando integrates NVIDIA Cosmos, Gen-3C, and DiffusionRenderer for rapid 3D production.

Zalando is scaling high-fidelity 3D product production by integrating ALLSIDES’ AI-powered photometric 3D scanning platform, powered by NVIDIA Cosmos, Gen-3C, and DiffusionRenderer. The pipeline goes from scan to storefront in just 48 hours.

TOOLING

LM Studio 0.4.14 Adds Multi-Token Prediction Support

LM Studio releases version 0.4.14 with Multi-Token Prediction (MTP) functionality, enabling faster inference by predicting multiple tokens simultaneously.

MOBILE

PixVerse App Adds On-Device Image Generation

PixVerse mobile app now supports image generation from prompts or reference images, with three free generations for all users during the May 24–31 launch window.

INFRA

CommonCrawl Recommends Hugging Face Buckets for Large Training Datasets

CommonCrawl recommends Hugging Face Buckets for storing large, constantly evolving training datasets. The content-defined chunking technology reportedly reduces upload volumes by 75% by only transmitting changed portions of datasets.

INSIGHT

“The Model Alone Is No Longer the Product”

OpenAI’s Greg Brockman observes that models in isolation are insufficient as products, pointing to a future where full application ecosystems define value in AI rather than raw model capabilities alone.

INSIGHT

Developers Reflect on Coding Before Codex

Greg Brockman captures the generational shift in software development, noting how quickly AI-assisted coding has become the default, making the pre-Codex era feel distant even to veteran engineers.

© 2026 FAV0 · AI Daily