
Alibaba Qianwen (Qwen) series released the Qwen3.7-Plus model this week. Input pricing is $0.40 per million tokens, output pricing is $1.60 per million tokens, for a total of $2.00. This is an 80% reduction compared with Qwen3.7-Max. Cached input pricing can be as low as $0.04 per million tokens. The target scenarios are high-frequency, repetitive tasks.
According to pricing information published by Alibaba official:
Standard input: $0.40 per million tokens
Standard output: $1.60 per million tokens
Total (input + output): $2.00
Cached input: $0.04 per million tokens (applies to agent scenarios such as repeatedly reading the same codebase or enterprise UI)
Comparison target: Qwen3.7-Max charges $2.50 for input, $7.50 for output, totaling $10.00. Chinese competitor MiniMax-M3 has a limited-time discount totaling $1.50, and Qwen3.7-Plus pricing closely follows it.
The following are Qwen3.7-Plus benchmark numbers published by Alibaba official; all are self-assessment data:
Terminal Bench 2.0-Terminus: 70.3 (DeepSeek-V4-Pro Max is 67.9, Gemini-3.1 Pro is 63.5)
ScreenSpot Pro (computer vision and interface understanding): 79.0 (GPT-5.4 xhigh is 67.4, Claude-Opus-4.6 is 49.5)
It is worth noting that Alibaba’s official documents also state that Qwen3.7-Plus’s overall performance is still lower than most leading closed-source U.S. models. The above numbers are single-point comparisons on specific tasks and do not represent overall performance.
Qwen3.7-Plus does not provide downloadable open model weights. All API calls must be processed through Alibaba Cloud international nodes, and data flows outside the user’s own servers. Under this architecture, the following scenarios face clear compliance barriers:
Industries with data sovereignty or regulatory constraints: healthcare (HIPAA, GDPR), defense, government agencies—need to evaluate whether external API routing meets compliance requirements
On-premises isolated deployment scenarios: cannot be deployed in a fully isolated local environment
Conversely, the advantage of a closed-source API mode is that it does not require the hardware procurement and maintenance of a multi-GPU cluster (such as Nvidia H100). In addition, the OpenAI-compatible format minimizes changes to existing infrastructure.
Cached pricing applies to scenarios where agents repeatedly read the same input, such as continuously accessing the same code repository, fixed enterprise UI templates, or system prompts kept for long periods. In large workflows with high-frequency, repetitive tasks, caching can significantly reduce total API costs. Alibaba has not published specific guarantees for cache hit rates or details about usage limitations.
Earlier Qwen series were released under Apache 2.0 and provided downloadable model weights, allowing anyone to deploy locally, fine-tune, and integrate into their own systems. Qwen3.7-Plus is provided only through Alibaba Cloud APIs and does not release model weights. This means it cannot be deployed locally or in isolated networks; all usage depends on Alibaba Cloud’s external infrastructure.
Qwen3.7-Plus’s official documentation explicitly states that scores such as Terminal Bench and ScreenSpot Pro are Alibaba’s self-assessed numbers, and overall performance is still lower than most leading closed-source U.S. models. Benchmark numbers reflect single-point performance on specific tasks and do not represent end-to-end latency, stability, or comprehensive performance in real production environments.
Related News
Microsoft Build releases 7 AI models, with Token usage 60% lower than competing products
Cisco’s stock price rises 5% in a single day, with its AI order target raised to $9 billion
Google Launches $80B Equity Financing With Berkshire $10B Investment
Qualcomm’s Dragonfly brand details have been delayed until June 24, with the pre-market share price down more than 8%