June 29, 2026 · Monday

Baidu Unlimited-OCR Parses Entire Books Without Zoom

Now integrated with vLLM and powered by Reference Sliding Window Attention, Baidu Unlimited-OCR achieves constant KV cache during full book parsing, eliminating memory bloat at any output length.

Unlimited-OCR achieves one-shot book parsing through Reference Sliding Window Attention (R-SWA), keeping KV cache fixed throughout decoding.

Baidu's Unlimited-OCR joins the vLLM ecosystem with a breakthrough for long-document processing: one-shot parsing of entire books without memory bloat. The core innovation, Reference Sliding Window Attention (R-SWA), maintains a fixed KV cache regardless of how many pages need to be processed. No memory blowup, no slowdown, no matter how long the output. This is particularly significant for academic and legal document processing where entire volumes need digitization in a single pass. The vLLM integration means the model benefits from the framework's optimized inference pipeline, making large-scale OCR deployment practical for the first time.



"Getting regulated by a government because your model is 'too dangerous' is the best marketing, especially for enterprise sales. Everyone is trying to get it now."

— Clement Delangue, CEO of HuggingFace

DeepSpec: DeepSeek Curated Model & Data Collection

deepseek-ai releases DeepSpec, a curated collection of models, datasets, papers, and Spaces on HuggingFace, providing the community with centralized access to their research artifacts and resources.

Emollick: GLM-5.2 Solid but Not Frontier; Open Weights Closing the Gap

GLM-5.2 is good but still behind GPT-5.5 and Opus 4.8, according to AI researcher Emollick. Crucially, open-source models have now crossed into GPT-5.2 capability territory — a milestone showing open weights are genuinely chasing the frontier.

Anthropic Launches Claude Tag: @Claude in Slack Channels

Claude Tag (Beta) lets Team and Enterprise users @Claude in Slack channels like a colleague. Admins pre-configure which channels, tools, data sources, and code repositories Claude can access, enabling anyone in the channel to assign tasks directly.

DeepSeek Accelerates Models via DSpark; Gemma4-12B May Be Best Local Model

DeepSeek has released accelerated models through DSpark, with Gemma4-12B as the standout. It may include vision capabilities and could become the best local model in its weight class by a significant margin. Qwen 3.5 is excluded due to lack of linear attention support in DSpark.

Repo Prompt Open-Sourced After Creator Joins OpenAI

The community edition of Repo Prompt is now open-source on GitHub. After OpenAI's developer experience lead invited Provencher to join the team, he first made the project free for paid users, then fully open-sourced it as a condition before accepting the role.

Goodside Reflects: LLMs Shattered the Expected AI Development Path

AI researcher Riley Goodside recalls believing for 25 years that AI would emerge through recursive self-improvement via reinforcement learning, not chatbots. He viewed the old analogy — making AI through chatbots is like making real flowers by sculpting wax — as essentially correct. The LLM era overturned that expectation, proving language-first approaches could achieve what iterative self-play could not.

Analysis Infers DeepSeek Profit Margins at 40-60%

TeortaxesTex calculates that DSpark data suggests DeepSeek margins between 40-60%, and potentially over 80% with GB300 hardware, refuting the common assumption that AI inference services operate at a loss.


Industry BriefsJun 29
Product

fofrAI: Gemini 3.5 Flash Is an Excellent Workhorse for Sub-Agents

AI developer fofrAI praises Gemini 3.5 Flash as determined, fast, and perfect for sub-agents, calling it the go-to workhorse for everyday AI tasks.

Research

Model Routers Underestimate Difficulty of Non-Math Tasks

All model routers undervalue non-verifiable tasks like innovation, marketing, and qualitative analysis, assigning too little intelligence where smarter AI models matter most.

Product

Adobe Firefly Boards Integrated Directly into Photoshop

Adobe Firefly Boards is now integrated into Photoshop, allowing users to collaborate and create using the Firefly AI canvas directly inside the design tool.

Policy

Emollick Questions Whether Gemini 3.5 Pro Is Export-Controlled

The question of whether Gemini 3.5 Pro faces export controls raises broader questions about regulatory boundaries for frontier AI systems.

Industry

Bangalore Auto Driver Shares How ChatGPT Helps Daily Life

An auto-rickshaw driver in Bengaluru uses ChatGPT for daily tasks, praising its ability to remember all queries and link conversations from four months ago. OpenAI has responded.

Paper

DiffusionBench: Comprehensive Benchmark for Diffusion Transformers

A new paper introduces DiffusionBench for holistic evaluation of diffusion transformers, providing an important reference for the research community.

Industry

Clement Delangue: AI's Future Is Multi-Model and Open-Source

HuggingFace CEO believes the future of AI involves multiple models, the majority of which will be open-source, facilitated by platforms like HuggingFace.

Research

China's Top AI Researchers: More Output, Sharply Lower Retention

Marco Polo snapshot from NeurIPS, ICML 2024 and ICLR 2025 data shows China's top researcher production rising but retention plummeting. The US advantage over China is 5.5x.

Product

Baidu Unlimited-OCR Tops HuggingFace Trending Models

Baidu Unlimited-OCR becomes the number one model on HuggingFace, drawing community attention for its ability to parse entire books in a single pass.


DSpark serving speed has noticeably jumped in recent days despite claims of stable service since May 8.

DeepSeek DSpark Speed Uplift Raises Questions About a Secret Upgrade

TeortaxesTex notes that while DeepSeek claims DSpark has been serving since May 8, the speed has clearly jumped in the last few days. The sudden improvement suggests they may be testing a better version behind the scenes, raising speculation about what comes next.



More BriefsJun 29
Industry

MiniMax and Cysic Hackathon Winners Announced

Winners of the MiniMax AI and Cysic co-hosted hackathon are announced, showcasing projects built on the M3 model.

Industry

Nathan Lambert: Open-Source Model Diversity Brings Hope

AI researcher Nathan Lambert says the diversity of companies building open-source models is encouraging, even though their stories are often overshadowed by frontier models.

Research

Mental Models for Benchmark Difficulty and Eval Design

Researcher CWolfe shares thoughts on benchmark difficulty, noting that evaluation design often relies on greedy heuristics and needs more systematic approaches.

Events

SF AI Engineer Week Kicks Off at Moscone Center

AI Engineer Week starts in San Francisco with badge pick-up and new engineer orientation from 5-9 PM today at Moscone Center.

Industry

Emollick Jokes: What Model Is OpenAI Saving GPT-6 For?

Given the naming jumps in recent releases, Emollick asks which model name OpenAI is reserving for GPT-6.

Research

TeortaxesTex Corrects: 'Duibiao' Means Head-to-Head, Not Imitate

After correction from a native speaker, TeortaxesTex notes 'duibiao' correctly means competing head-on, making him optimistic about the benchmark project.

Policy

Nathan Lambert Warns of 'Vibe Regulation' Consequences

Nathan Lambert warns that vibe regulation of frontier models has real and frightening consequences for the AI industry.

Research

Arohan: Kernel Debates Will Drive Better Linear Algebra

AI researcher Arohan predicts that current debates in the field will lead to significantly better linear algebra kernels.

Product

Midjourney 8.1 HD Mode Fixes Hand Generation Details

Hand generation is still problematic in Midjourney 8.1 standard mode, but HD rerun significantly improves through detail regeneration, not simple upscaling.

Industry

VentureTwins: No Surprise AI-Generated Junk Writing Is Popular

Commentary argues it is unnecessary to be shocked by AI junk writing trending, as popular books already reflect a garbage culture.

Industry

Oran_ge on Programmer Choices After AI Boosts Code Efficiency

If code efficiency improves 10x, should programmers write 100x more code or spend saved time on scarce, important tasks, and which will bosses expect?

Research

TeortaxesTex Questions GLM 5.2 Evaluation Authenticity

TeortaxesTex questions whether GLM 5.2-Cyber evaluation results are genuine or from an unrepresentative benchmark.

© 2026 FAV0 · AI Daily

MORE·