xAI Launches Grok Voice API Voice Cloning Feature
xAI releases a voice cloning feature for the Grok Voice API, enabling cloning of natural-sounding speech from short audio recordings and voice library management via a console for personalized brand voice customization.
Two voices. One human. One AI. Voice cloning rich with natural emotion is now live on the Grok Voice API. Users can clone voices from short recordings and manage voice libraries through the xAI console, opening up personalized voice experiences for brands and developers. The feature supports natural emotional inflection, making cloned voices indistinguishable from human speech in conversation.
Ollama Supports Claude Desktop, Enables Third-Party Inference
Ollama now supports all models on Ollama Cloud, including Claude Cowork and Claude Code, via Claude Desktop's built-in third-party inference feature.
Ollama now supports Claude Desktop via built-in third-party inference. The integration allows all models from Ollama Cloud to be used across Claude Cowork and Claude Code directly from the Claude Desktop app. This bridges the gap between self-hosted open-source models and frontier AI coding tools, giving developers a seamless path to leverage local models within Anthropic's ecosystem.
We need to create a new term for the attacks some Chinese labs are doing on APIs that is different than distillation, or else we risk tarnishing a crucial technique that is fundamental to AI diffusion, academic research, and the open-source ecosystem.
Nathan Lambert, interconnects.ai
Perplexity Computer Integrates with Microsoft Teams
Perplexity Computer is now available within Microsoft Teams, allowing users to conduct research, analysis, and document creation directly in the Teams workspace with the same capabilities as the standalone Computer product.
Luma Launches Creative Agent for Full Ad Systems
Luma Agents automates the entire process from planning and generation to iterative optimization, turning creative ideas into complete advertising systems. Users define the concept and aesthetic direction, then the agent handles the rest.
GB300 Ultra NVL72 Leaks: 2.7x Faster Than GB200 on Inference
SemiAnalysis reports that the GB300 Ultra NVL72 is 2.7 times faster than the GB200 NVL72 on industry-standard inference benchmarks, marking a significant generational leap in AI training and inference hardware performance.
DeepSeek-V4: Mixed Attention Cuts KV Cache by 90%, Supports 1M-Token Context
DeepSeek-V4 uses a hybrid attention and sparse MoE architecture that reduces KV cache by up to 90%, enabling support for context lengths of one million tokens while maintaining inference efficiency.
NVIDIA: AI Is a Five-Layer Cake — Energy, Chips, Infrastructure, Models, Apps
NVIDIA frames AI infrastructure as five interdependent layers: energy, chips, infrastructure, models, and applications. The countries and companies that build the full stack will define the next industrial era.
IBM Granite 4.1-8B Released, Optimized for 8–16GB VRAM Hardware
The IBM Granite 4.1-8B model is now open-sourced on Hugging Face, specifically optimized for hardware with 8 to 16GB of VRAM, advancing the frontier of accessible open-source AI for developers.
nanowhale: Small DeepSeek Model Fully Pretrained by an Agent
Inspired by Karpathy's nanochat, nanowhale is a tiny DeepSeek model entirely pretrained by an AI agent, showcasing automated model training as a new paradigm. The project demonstrates that agents can handle the full pretraining pipeline autonomously.
XGrammar-2: Structured Generation for Complex Agent Harnesses
XGrammar-2 introduces structured generation for complex agent frameworks, supporting strict tool-calling formats with built-in DeepSeek integration. It ensures reliable output formatting for multi-agent orchestration scenarios.
Grok 4.3 Builds an Entire Game from a Single Prompt
Grok 4.3 demonstrated the ability to build a complete playable game from a single prompt, featuring the fastest token output speed of any model and outperforming Claude Sonnet in end-to-end generation speed.
François Chollet's "Deep Learning with Python" Now Free to Read Online
The definitive guide to deep learning, which sold 120,000 copies and helped tens of thousands launch their careers, is now available to read online for free. The book demystifies how deep learning works and how to apply it effectively.
Replit: Build Full Pitch Decks by Describing What You Want
Replit now lets users generate full pitch decks without touching a single slide. Describe your idea, iterate in chat, edit visually, then export to PPTX, Google Slides, or PDF, or publish as a live URL.
Web2BigTable: Multi-Agent LLM System for Internet-Scale Search
A bi-level multi-agent framework for internet-scale web search and table extraction. On the WideSearch benchmark, it achieves an Avg@4 success rate of 38.50, dramatically outperforming the second-place score of 5.10.
Qwen 3.6: High TPS on Just 12GB VRAM
Community-shared Qwen 3.6 configs deliver fast tokens-per-second even on consumer GPUs with only 12GB VRAM.
Can Open-Weight Coding Agents Match Claude Code?
New study explores whether open-weight coding agents with harnesses can rival Claude Code on training domain-specific models.
Blackwell Ultra: Named for Ultra Performance
NVIDIA's Blackwell Ultra derives its name from its ultra-high GPU performance, confirmed by SemiAnalysis.
Anthropic co-founder Jack Clark says 60% chance of RSI by end of 2028.
via @goodside
AI Multi-Modal Learning Platform for Deaf Students
Replit CEO Amjad Masad spotlights an AI-powered multi-modal learning platform purpose-built for deaf students.
Most Agentic Parallelism Anywhere Online Happens on Replit
Amjad Masad notes Replit hosts more parallel agentic development activity than any other internet platform: 10 active, 198 draft, 700+ completed.
Hugging Face Model Visualizer Lets You Explore Any Architecture
A new community tool visualizes Hugging Face model architectures at any granularity by simply entering a model URL, supporting layer-level exploration and cross-model comparison.
Top Papers: Recursive Multi-Agent Systems and World Modeling
Hugging Papers highlights the week's best research on recursive multi-agent systems, agentic world modeling, and AI organizational structures.
UniVidX: Unified Multimodal Framework for Video Generation via Diffusion Priors
UniVidX proposes a unified multimodal framework leveraging diffusion priors, achieving SOTA on RGB and RGBA layer composition tasks.
DeepSeek, Xiaomi, OpenAI Models Trending on Hugging Face
Current trending open models on Hugging Face include releases from DeepSeek, Xiaomi, OpenAI, Mistral AI, and AI Pool, reflecting a diverse open-source landscape.
Software Is a Cache of Agents
A thought-provoking thesis: traditional software is essentially a cache of proven agent workflows, crystallizing reliable multi-step processes into deterministic logic that no longer requires runtime reasoning.
Transformer Gradients Are Sparse — Low-Rank Exploration Justified
An investigation into Transformer gradients reveals they are sparse in certain dimensions, validating low-rank approximation methods for efficient model training and fine-tuning.
Claude 4.7 Accurately Explains the Origins of Prompt Injection
A Claude 4.7 research report precisely traced the history of prompt injection attacks, accurately referencing early tweets and adversarial examples that first demonstrated the vulnerability.
Luma Agents Generate Winning Client Pitch Boards
Luma Agents automatically plans, generates, and optimizes client pitch boards. Users set the brief and aesthetic direction, and the agent produces high-quality proposals designed to win.