Anthropic Code Mode’s MCP vs CLI battle: tools pin runtime, tokens drop from 150K to 2K

ChainNewsAbmedia

Throughout all of 2025, AI engineering communities have been debating endlessly over the question of whether “MCP vs CLI” is better suited for Agent tool calling. In November 2025, Anthropic’s paper “Code execution with MCP” redefined the problem from first principles. akshay_pachaar 5/10 summarized the thread explaining that the issue has never been the protocol itself, but the old habit of stuffing the descriptions of all tools into the context at the start of a session. Anthropic’s solution is to have the model write code to call tools, while runtime is responsible for managing the tool details. The new approach is called “Code Mode”.

The problem with the old mode: most of the model’s 150K tokens go unused

How the old MCP mode wastes tokens:

Playwright MCP: 13.7K tokens (filled all at once)

Chrome DevTools MCP: 18K tokens

5 server configurations: 55K tokens burned before they even start working

Single workflow fully executed: can bloat to 150K tokens

What the model actually uses: most of it is not usable

Critics argue for switching to CLI, but CLI is error-prone in multi-tenant apps, lacks typed contracts, and agents unfamiliar with APIs need extra rounds to parse outputs as text. Both sides have arguments, but they’ve misdiagnosed the problem.

The solution: have the model write code to call tools, no longer call directly from context

The core of Anthropic’s proposed “Code Mode”:

Flip the model’s role: it’s not the model calling tools through context; instead, the model writes code and runtime calls the tools

Tools live in runtime, and the model only sees the part it imports

Types follow the imports: the model imports a tool, and it gets that tool’s type contract

Call already-installed binaries via Bash (git, curl, etc.)

Use typed module imports to call proprietary APIs

Anthropic example: Google Drive transcript text flows into a Salesforce CRM update. In the old approach, you load schema for both sides’ tools and send the entire transcript text through the model twice; in the new approach, 10 lines of TypeScript only import what’s needed, and the same task is compressed from the original 150K down to 2K tokens—a 98.7% reduction.

Cloudflare pushed it to the limit: 2,500 endpoint APIs, compressing from 1.17M tokens to 1K

Cloudflare did the most aggressive version:

Original API scale: 2,500 endpoint APIs, with schemas totaling 1.17M tokens

New approach: expose only two functions, search and execute, totaling 1K tokens

The agent writes code first to search the tool directory, then execute the corresponding tools

Compression ratio: over 1kx

The claim that “MCP is dead” is wrong—Anthropic has published that MCP SDK downloads have reached 300 million, up from 100 million at the beginning of the year. It’s one of the fastest-growing Agent infrastructure efforts right now. What’s “dead” is the approach of loading all tools at once when a session starts—and that was a bad idea in the first place. For developers writing Agents in 2026, the rule is simple: tool definitions belong to code, not context; the model writes a few lines of code to call, and runtime handles the rest.

Specific trackable follow-ups: the continuing growth rate of MCP SDK downloads from 300 million; whether Anthropic standardizes Code Mode as the official recommended mode in the MCP spec; and the adoption progress of Code Mode by other Agent platforms such as OpenAI, Google, and Cursor.

This article on Anthropic’s Code Mode resolving the MCP vs CLI debate: tools live in runtime, tokens compressed from 150K to 2K first appeared on Lian News ABMedia.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

Related Articles

Hong Kong Financial Secretary Paul Chan Highlights Greater Bay Area's Three Core Advantages: AI, Manufacturing, Finance

According to ChainCatcher, Hong Kong Financial Secretary Paul Chan released a written statement recently, outlining the Greater Bay Area's three core advantages: cutting-edge AI research, complete high-end

GateNews18m ago

ByteDance Plans 25% Increase in AI Infrastructure Spending to 200 Billion Yuan This Year

According to ChainCatcher citing Golden Data, ByteDance plans to increase AI infrastructure spending by 25% to 200 billion yuan this year, driven by rising memory chip costs and accelerated artificial intelligence

GateNews3h ago

Enterprise AI Platform Pit Closes $16M Series Funding Led by a16z

According to Odaily, enterprise AI platform Pit announced the completion of a $16 million funding round led by a16z, with participation from Lakestar and executives from OpenAI, Anthropic, Google, Deel, and Revolut. Pit positions itself as "AI product team as a service," designed to replace

GateNews3h ago

Google Pilots Hiring Exams That Let Engineers Use AI Tools

According to The Chosun Daily, Google is piloting hiring exams that let US software engineer candidates use AI tools in selected entry-level and mid-level positions. The trial includes code comprehension tasks where applicants review existing code, fix bugs, and improve performance. Interviewers

GateNews6h ago

OpenAI Discontinues Fine-tuning API Effective Immediately, Existing Users Can Access Until January 6, 2027

According to OpenAI's official announcement monitored by Beating, the company is discontinuing its self-serve Fine-tuning API for developers effective immediately. New users can no longer create fine-tuning tasks, while existing active users can access the service until January 6, 2027. Deployed fin

GateNews6h ago

Sakana AI and Nvidia Achieve 30% Faster H100 Inference by Skipping 80% of Invalid Computations

Sakana AI and Nvidia have open-sourced TwELL, a sparse data format that enables H100 GPUs to skip 80% of invalid computations in large language models without sacrificing accuracy. The solution delivers up to 30% faster inference and 24% faster training on H100s while reducing peak memory usage.

GateNews7h ago
Comment
0/400
No comments