OpenAI Launches First In-House AI Chip Jalapeño
Designed from the ground up and produced with Broadcom, Jalapeño is purpose-built for the LLM workloads powering ChatGPT, Codex, the API, and future agentic products.
OpenAI has crossed a critical threshold in its vertical integration strategy with the announcement of Jalapeño, its first proprietary AI chip. Built in partnership with Broadcom, the chip is purpose-designed for the large language model workloads that underpin ChatGPT, Codex, the OpenAI API, and upcoming agentic products. The move signals OpenAI's intent to reduce reliance on third-party silicon as it scales inference to hundreds of millions of users. Industry observers note that custom silicon has become table stakes for frontier AI labs: Google has its TPUs, Amazon has Trainium, and Microsoft is rumored to be developing its own accelerators. OpenAI's entry into the chip race closes one of the last remaining gaps in its full-stack ambition, spanning from model research all the way down to deployment hardware.
GPT-5.5 Instant Gets Major Personality and Reasoning Upgrade
OpenAI's most-used model is now better at understanding intent, handling complex constraints, and — as co-founder Greg Brockman put it — much more fun to talk to.
The update to GPT-5.5 Instant marks a meaningful step forward for OpenAI's workhorse model, which handles the majority of ChatGPT conversations. The improvements span three axes: richer intent understanding that adapts tone and depth to the user's question, more reliable handling of multi-constraint prompts, and enhanced shopping and planning features. Co-founder Greg Brockman confirmed the update delivers significant improvements across the board, noting the model has become genuinely more enjoyable to interact with — a dimension that increasingly matters as AI assistants become daily companions rather than occasional tools.
Qwen-AgentWorld: A Language World Model for Seven Agent Environments
Alibaba's Qwen team released AgentWorld, a native language world model that simulates seven agent environments — MCP, Search, Terminal, SWE, Web, OS, and Android — within a single model. Environment modeling is the training objective from day one, not a post-hoc adaptation, representing a significant architectural departure from conventional LLM training paradigms.
Kog Laneformer 2B Hits 3,000+ Tokens Per Second
Kog open-sourced Laneformer, a 2.3B-parameter model achieving over 3,000 tokens per second on single-request inference through delayed tensor parallelism (DTP). Hugging Face CEO Clement Delangue confirmed the model is publicly available with weights and code on the Hub, marking a new latency frontier for small open models. The model was pretrained on approximately 4T tokens and fine-tuned on code and reasoning data.
Claude Gets Its Own Identity: Agent Identity Access Model
Claude now operates with an independent agent identity in team channels, provisioned with its own credentials and authorized like any other teammate. When tagged in a channel, Claude is provisioned as a distinct identity rather than impersonating the user who invoked it — a critical architectural choice for enterprise deployment that addresses long-standing questions about auditing and access control in AI-augmented workflows.
The most complex phenomena arise from scalable recombination of very simple rules. Whether it's galaxies, chips, or neural networks — if you find the right primitive building blocks, the complexity takes care of itself.
François Chollet
Vercel Unveils eve: An Agent Framework Built Like Next.js
Vercel released eve, an agent framework positioned as the Next.js equivalent for AI agents. Developers write instructions in Markdown and build tools in TypeScript, with persistence supported by default. The team is actively recruiting early adopters building production agents for a direct feedback channel with eve engineers. Positioned as a full agent framework rather than a thin wrapper, eve aims to standardize the way developers build, deploy, and iterate on AI agents.
Kimi API Lands on AWS Marketplace
Moonshot's Kimi API is now accessible via AWS Marketplace with consolidated billing and EDP commitment eligibility for enterprise customers.
Connectors Plug Into Airtable, Dropbox, Google Drive
Luma Connectors integrates external tools into boards on demand and supports creative workflows with built-in AI agents across planning, generation, and iteration stages.
Now Delegates Tasks Directly from Notion via SDK
Built on the Cursor SDK, users can now assign tasks from Notion to Cursor's cloud agents running on the same models, harness, and runtime.
Computer for Counsel Targets Legal Research
Computer now connects to legal research databases, document tools, and case-management systems, providing citable sources for Pro and Max subscribers.
Grok Build Adds Official MongoDB Plugin
The official MongoDB plugin in Grok Build supports data queries, index optimization, and database management directly within the AI coding environment.
Step Plan Integrates with Claude Code for Agent Workflows
StepFun's Step Plan tool simplifies API calls and enables rapid iteration of agent workflows when paired with Claude Code, targeting real build scenarios.
GLM-5.2 Arrives in Cursor, Tops OpenRouter Rankings
Zhipu AI's GLM-5.2 model has been integrated into Cursor, showing strong usage on OpenRouter's Cursor ranking. The model scored 22.8% on ARC-AGI-2 — the best among Chinese models — placing it near Opus 4.5 levels. Nathan Lambert noted the model performs strongly on several benchmarks but exhibits brittle characteristics in edge cases, recommending multi-model strategies depending on task profiles. On CursorBench, GLM-5.2's cost sits near Opus frontier levels, putting downward pressure on closed-model pricing margins.
Nathan Lambert Releases LLM Fundamentals Lecture with GLM 5.2
Nathan Lambert published a lecture covering language model architecture basics — LM Head, KV Cache, Speculative Decoding, and training fundamentals — using GLM 5.2 for demonstrations. The lecture addresses common prerequisites questions from readers of his forthcoming book and serves as an accessible entry point for understanding how modern language models work under the hood.
Seedance 2.0 in Native 4K via MCP
Seedance 2.0 delivers native 4K resolution video generation through Pika MCP, covering full production pipelines.
One-Click Ad Localization from a Single Image
Runway's new localization feature generates multi-language ad variants from one image in a single click.
V4.1 Excels at Playful Fashion Illustrations
V4.1 produces fashion illustrations with bright colors, graphic shapes, and detailed accessories.
Animator Turns 3D Previz into Full Anime
A professional animator used 3D previz as input, with Seedance rendering the final anime while preserving motion and camera control.
Qualcomm Partners with Hugging Face
At Qualcomm's Investor Day, CEO Cristiano Amon and Hugging Face CEO Clement Delangue announced a partnership as "one more thing," though specific collaboration details remain undisclosed. The move signals growing hardware-software alignment in the AI ecosystem.
Anthropic Negotiates Fable 5 Unlock with Trump Administration
WIRED reports Anthropic co-founder Tom Brown has replaced Dario Amodei as lead negotiator with the Trump administration over lifting restrictions on the Fable 5 model. One source said Brown communicates more directly than Amodei.
Huawei Claims 950 SuperPOD Demo at Shanghai Expo in Mid-July
Huawei plans to showcase an 8192-NPU, 160-cabinet SuperPOD at the Shanghai World Expo by mid-July, signalling mass production of the 950DT chip and China's entry into its domestic Hopper+ era.
MiniMax M3 Becomes Default Builder Model for Kimchi Coding
MiniMax's M3 model — with open weights, 1M context window, and strong coding capability — was selected as the default builder model in Kimchi Coding by Cast AI.
DeepMind Explores the Rise of Agent Economies
A new podcast examines what happens when millions of AI agents begin negotiating, transacting, and delegating — and how to diversify decision-making to avoid AI groupthink.
Sakana AI Partners with OpenRouter for Resilient Architecture
Sakana AI's Hardmaru announced a partnership with OpenRouter, noting products like OpenRouter Fusion and Sakana Fugu spark important conversations about dependency and resilience in AI.
Claude Tag: Useful but a Risky Bargain for Enterprises
A researcher called Claude's new Tag feature extremely useful but warned its pricing model and lock-in risk could put enterprises in dangerous bargaining positions. The four big changes together mean that users interact with Claude as a coworker rather than a tool — a shift that brings both productivity gains and vendor dependency concerns.
AI Usage Decisions Are Now Organizational Design, Not IT
Ethan Mollick argued that decisions about AI in organizations increasingly concern organizational design and strategy: how to integrate agents, what intelligence to outsource, what the boundaries of the firm should be, and what role humans play — rather than simply being technology procurement choices.
Seedance Video Cost: 40K Tokens per Second of 1080p
Official documents reveal 1 second of Seedance video at 1080p uses 40,000 tokens. At Doubao's 180T tokens per day, that translates to roughly 150 million people generating 30 seconds each — a sobering scaling constraint.
Cola Launches Seed 2.1 Pro with ColaOS
Cola released Seed 2.1 Pro, a natively multimodal model with enhanced coding and agent capabilities over 2.0. ColaOS, described as an operating system with soul, features a persistent agent that remembers users and grows over time.
NVIDIA Full-Stack AI Powers Autonomous Brand Operations
NVIDIA highlighted its full-stack AI capabilities in causal marketing analytics, trustworthy agentic workflows, and real-time hyper-efficient auction bidding for global brands.
Agent Defined: LLM + Instructions + Tools + Environment
A simple agent definition: an LLM backbone running in an agentic loop, with four components — the model, instructions, tools, and the environment.
Zhipu AI Goes from HKD 120 IPO to Beating DeepSeek
Zhipu AI IPO'd at HKD 120 per share in January. GLM has since surpassed DeepSeek as a leading open model, and the company is returning to San Francisco.
Claude Code Web Hit by GitHub Egress Policy Block
Simon Willison reported Claude Code for Web displays "GitHub is blocked by egress policy," severely disrupting workflows that involve cloning repositories for reference documentation.
Governments Should Build Model Evaluations Like Civics Exams
A researcher proposed each government create environments and evaluations for desired model capabilities, similar to various civics examinations.
AI Commercialization Is Fundamentally a 2Boss Model
A commentary argued China's AI monetization follows a 2Boss pattern: bosses pay for programmers to use Claude and Codex, and for creators to use Seedance.
Fast GLM Model Now Live
A fast GLM variant is now available, per Vercel CEO Guillermo Rauch.
AI Gateway Token and Uptime Recovery Staggering
Rauch called the AI Gateway's data on token and availability recovery astonishing.
Eco Wave Power Uses Digital Twins for Wave Energy
Omniverse digital twins and accelerated computing simulate wave conditions for renewable energy.
Caper Carts Powered by NVIDIA Jetson Edge AI
Smart shopping carts recognize products in real grocery stores despite changing shelves and spotty Wi-Fi.
hf-claude Now Works with GLM 5.2
The hf-claude tool adds GLM 5.2 support for Hugging Face extension workflows.
Asset Collection Tool Finally Gets MCP Interface
Users can now access reference assets directly without manual transfer between applications.
Codex Saves Significant Time on Windows Tasks
Ethan Mollick credits Codex and Code with solving annoying Windows problems effortlessly.
PyTorch Profiler Beginner's Guide Published
Hugging Face team published an introductory guide to torch.profiler for performance tuning.
Manage Cursor Skills Per Project to Save Context
A user detailed their geeky approach to installing Skills only at project level to reduce context overhead.
Launches AI Film Festival with $200K Prize Pool
Winning entries will be screened at a major film festival later this year.
Tokenmaxxing at Training, One-Shot at Inference
A researcher argued training should maximize token usage for learning while inference should target one-shot solutions for efficiency.
Fable 5 May Return as Permanent Subscription Feature
Speculation grows that Fable 5 will return soon as a permanent part of the subscription, potentially with stricter identity verification requirements.
Longtime London GDM Members Depart as Focus Shifts to MTV
Multiple veteran London-based Google DeepMind researchers have departed, consistent with reports that pre-training center of gravity is gradually shifting toward Mountain View.
Many Who Claim They Never Use AI Are Secretly Using It
Ethan Mollick shared research suggesting a significant gap between stated and actual AI usage, with many self-reported non-users quietly relying on AI tools in their daily workflows.
AI Podcast Transcription Tip: Generate Multiple Drafts and Merge
A user shared their approach to AI podcast transcription: generate 2–3 drafts simultaneously, select the best as a base, and merge content from others to avoid omissions and quality variance — since fixing a bad first draft is often harder than starting fresh from a better one.