Throughout all of 2025, AI engineering communities have been debating endlessly over the question of whether “MCP vs CLI” is better suited for Agent tool calling. In November 2025, Anthropic’s paper “Code execution with MCP” redefined the problem from first principles. akshay_pachaar 5/10 summarized the thread explaining that the issue has never been the protocol itself, but the old habit of stuffing the descriptions of all tools into the context at the start of a session. Anthropic’s solution is to have the model write code to call tools, while runtime is responsible for managing the tool details. The new approach is called “Code Mode”.
The problem with the old mode: most of the model’s 150K tokens go unused
How the old MCP mode wastes tokens:
Playwright MCP: 13.7K tokens (filled all at once)
Chrome DevTools MCP: 18K tokens
5 server configurations: 55K tokens burned before they even start working
Single workflow fully executed: can bloat to 150K tokens
What the model actually uses: most of it is not usable
Critics argue for switching to CLI, but CLI is error-prone in multi-tenant apps, lacks typed contracts, and agents unfamiliar with APIs need extra rounds to parse outputs as text. Both sides have arguments, but they’ve misdiagnosed the problem.
The solution: have the model write code to call tools, no longer call directly from context
The core of Anthropic’s proposed “Code Mode”:
Flip the model’s role: it’s not the model calling tools through context; instead, the model writes code and runtime calls the tools
Tools live in runtime, and the model only sees the part it imports
Types follow the imports: the model imports a tool, and it gets that tool’s type contract
Call already-installed binaries via Bash (git, curl, etc.)
Use typed module imports to call proprietary APIs
Anthropic example: Google Drive transcript text flows into a Salesforce CRM update. In the old approach, you load schema for both sides’ tools and send the entire transcript text through the model twice; in the new approach, 10 lines of TypeScript only import what’s needed, and the same task is compressed from the original 150K down to 2K tokens—a 98.7% reduction.
Cloudflare pushed it to the limit: 2,500 endpoint APIs, compressing from 1.17M tokens to 1K
Cloudflare did the most aggressive version:
Original API scale: 2,500 endpoint APIs, with schemas totaling 1.17M tokens
New approach: expose only two functions, search and execute, totaling 1K tokens
The agent writes code first to search the tool directory, then execute the corresponding tools
Compression ratio: over 1kx
The claim that “MCP is dead” is wrong—Anthropic has published that MCP SDK downloads have reached 300 million, up from 100 million at the beginning of the year. It’s one of the fastest-growing Agent infrastructure efforts right now. What’s “dead” is the approach of loading all tools at once when a session starts—and that was a bad idea in the first place. For developers writing Agents in 2026, the rule is simple: tool definitions belong to code, not context; the model writes a few lines of code to call, and runtime handles the rest.
Specific trackable follow-ups: the continuing growth rate of MCP SDK downloads from 300 million; whether Anthropic standardizes Code Mode as the official recommended mode in the MCP spec; and the adoption progress of Code Mode by other Agent platforms such as OpenAI, Google, and Cursor.
This article on Anthropic’s Code Mode resolving the MCP vs CLI debate: tools live in runtime, tokens compressed from 150K to 2K first appeared on Lian News ABMedia.
Related Articles
Hong Kong Financial Secretary Paul Chan Highlights Greater Bay Area's Three Core Advantages: AI, Manufacturing, Finance
ByteDance Plans 25% Increase in AI Infrastructure Spending to 200 Billion Yuan This Year
OpenAI Discontinues Fine-tuning API Effective Immediately, Existing Users Can Access Until January 6, 2027
Sakana AI and Nvidia Achieve 30% Faster H100 Inference by Skipping 80% of Invalid Computations