DashScope Pricing in 2026: Why There's No 'Subscription Plan', and What Qwen and Wanxiang Actually Cost
People keep searching 'DashScope subscription plans' and find nothing, because DashScope doesn't sell plans. It meters every call: Qwen text models are billed per token with tiered rates that change based on your request's input size, and Wanxiang image and video are billed per image and per second. Each model ships a free quota of 1M tokens or 50 images that expires 90 days after you enable Model Studio. Here are the real 2026 numbers and the three levers that actually move your bill.
Context above, deep read below. Use the TOC to move section by section without losing the thread.
A team lead messaged me last month asking which DashScope plan to put on the company card. He'd been told to "get the Qwen-Max subscription" and had spent twenty minutes in the Alibaba Cloud console looking for a pricing page with monthly tiers — Starter, Pro, Enterprise — the shape every SaaS tool has trained him to expect. There isn't one. He wasn't missing a page. He was looking for a product that doesn't exist.
DashScope, which is the API surface of Alibaba's Model Studio platform, doesn't sell subscriptions. It meters. Every call is priced and billed for exactly what it consumed, and the word people reach for — "plan" — maps onto nothing in the system. What they're actually choosing when they say "the Qwen-Max plan" is a model, and you don't subscribe to a model, you just call it. That one misunderstanding is why "dashscope api subscription plans" is a search that never finds what it's looking for, and it's worth fixing before you look at a single number, because it changes how you budget.
There is no plan. There is a meter.
Here's the mental model that makes everything else click: DashScope is a metered utility, like electricity, not a gym membership. You don't pick a tier and pay monthly. You send requests, and at the end of the billing cycle you're charged for the tokens and images those requests consumed, at each model's published rate.
That has three consequences people miss:
- You can call every model with one account, no commitment. Nothing stops you from using Qwen-Turbo for cheap bulk work and Qwen3-Max for the hard requests in the same app, same key, same hour. The "tier" is per-request, decided by which model name you put in the call.
- Idle costs nothing. No seat fee, no monthly minimum. An app that gets no traffic this month bills zero (once you're past the free quota window).
- Your bill is a function of usage shape, not a number you picked. Two teams on "the same model" can have wildly different costs because one sends short prompts and one stuffs 200K-token contexts in. Which brings us to the part that actually surprises people.
The Qwen text pricing, and the tier trap inside it
Qwen text models are billed per token, input and output priced separately. The formula is boring: (input tokens × input rate) + (output tokens × output rate). The part that isn't boring is that for most models the rate is tiered by the input size of the individual request — and the whole request bills at the tier its input size lands in, not just the overflow.
These are the International (Singapore) deployment rates in USD per 1M tokens, as of May 2026. Always confirm against the official pricing page before you commit a budget; model lineups and rates move.
| Model | Input (per 1M) | Output (per 1M) | Free quota |
|---|---|---|---|
| Qwen-Turbo | $0.05 | $0.2 (non-thinking) / $0.5 (thinking) | 1M tokens, 90 days |
| Qwen-Flash | $0.05 (≤256K) → $0.25 | $0.4 (≤256K) → $2 | 1M tokens, 90 days |
| Qwen-Plus | $0.4 (≤256K) → $1.2 | $1.2 (≤256K) → $3.6 | 1M tokens, 90 days |
| Qwen3-Max | $1.2 (≤32K) → $2.4 (≤128K) → $3 | $6 → $12 → $15 | 1M tokens, 90 days |
Read the arrows carefully, because this is the tier trap. On Qwen-Plus, a request whose input is 200K tokens bills input at $0.4/M. Push that same request to 300K input and the entire input bills at $1.2/M — three times the rate, applied to all 300K tokens, not just the 44K past the 256K line. The threshold is a cliff, not a slope. If you're anywhere near a tier boundary, trimming context to stay under it is the single highest-leverage cost move you have, and it's invisible if you only look at the headline "$0.4" number.
Qwen3-Max has three tiers on a tighter ladder (32K, then 128K, then up to 252K), so long-context work on the flagship model escalates faster than on Qwen-Plus. A reasoning agent that grows its context across a long tool-use loop can quietly cross two boundaries in one session.
The two discounts most people leave on the table
Beyond picking the right model and watching tier boundaries, DashScope has two built-in levers that cut the per-token rate directly:
Batch calls bill at 50%. If your work isn't latency-sensitive — overnight summarization, bulk classification, dataset labeling — the Batch API runs the same models at half the real-time input and output rate. For a labeling job that doesn't care whether a row finishes in 200ms or an hour, that's a straight 2× cost cut with no quality change.
Context caching discounts repeated input. When you send the same long prefix over and over — a big system prompt, a fixed document, few-shot examples — context caching lets the repeated input tokens bill at a discount. The catch worth knowing: the discount applies to input only, and it doesn't stack with Batch — you get one or the other, not both. So the decision is roughly: bursty interactive traffic with a fat shared prompt → caching; big asynchronous jobs → Batch.
Neither is a "plan" you sign up for. Batch is a different endpoint; caching is a flag on the request. Both are usage-shape choices, which is the recurring theme of this whole platform.
Tongyi Wanxiang: priced by the image, not the token
Here's where the "DashScope has one price" instinct breaks completely. Tongyi Wanxiang is Alibaba's image and video generation family, and it lives on the same platform under the same API key — but it isn't billed per token at all, because tokens are the wrong unit for a picture.
On the mainland (Beijing) deployment, Wanxiang 2.6 pricing as of May 2026:
| Capability | Price | Free quota (90 days) |
|---|---|---|
| Text-to-image (wan2.6-t2i) | ¥0.2 / image | 50 images |
| Text-to-video (wan2.6-t2v) | ¥0.6/sec (720P), ¥1/sec (1080P) | 50 seconds |
| Image-to-video (wan2.6-i2v) | ¥0.6/sec (720P), ¥1/sec (1080P) | 50 seconds |
Two things to internalize. First, the free quota counts successfully generated output images — a failed generation or an input image you upload for image-to-video doesn't burn a credit. So you can iterate on prompts that error out without watching your 50 images drain. Second, the units differ by region: the Qwen text rates above are International-deployment USD, while these Wanxiang rates are mainland RMB. Model Studio runs separate billing regions (mainland Beijing, International Singapore, and others), and you pick one — you don't get to mix a mainland Wanxiang price with an International Qwen price on one invoice. If you're costing out a product that does both text and images, price each on the region you'll actually deploy to.
For video, the per-second billing means a 5-second 1080P clip is ¥5, and the math scales linearly — there's no per-clip flat fee hiding in there. The 50-second free quota is roughly ten short 720P clips to evaluate quality before you spend anything.
Getting the API key, because that's the other thing you searched for
You can't price anything until you can call it, and the key trips people up because they look for a per-model key that doesn't exist. The flow:
- Enable Model Studio in the Alibaba Cloud console (this is what activates your free quotas and starts the 90-day clock).
- Open the API-Key page under Model Studio and create a key.
- Use that one key for everything — Qwen text, Wanxiang images, embeddings. It's a platform key, not a per-model key.
You pass it as a Bearer token. DashScope also exposes an OpenAI-compatible endpoint, so if you already have code built on the OpenAI SDK, you often don't rewrite anything: point the base URL at DashScope's compatible endpoint, swap in your key, and change the model name to qwen-plus or whichever you want. Keep the key server-side. Once the free quota expires, that key spends real money, and a leaked key spends it on someone else's prompts.
So what should you actually budget?
Drop the word "plan" and budget the way the meter works:
- Prototyping: ¥0 / $0. Live on the free quotas — 1M tokens per Qwen model and 50 Wanxiang images — for the first 90 days. That's enough to validate most ideas without a single charge.
- Cheap, high-volume text: Qwen-Turbo or Qwen-Flash, keep requests under the first tier boundary, use Batch for anything asynchronous. This is the floor, and it's genuinely cheap — fractions of a cent per typical request.
- Quality-critical text: Qwen-Plus for the balance, Qwen3-Max for the hardest requests only. Watch the tier cliffs; a context-trimming pass before the flagship model pays for itself.
- Images and video: budget ¥0.2/image and ¥0.6–1/second of video, in RMB, on the mainland deployment. Linear and predictable once you know your volume.
The reason "dashscope api pricing structure" is a confusing search isn't that the prices are hidden. It's that there's no single structure — there's a token meter for text with tier cliffs and two discounts, and a per-image/per-second meter for media, billed in different currencies on different regions. Once you see it as a metered utility instead of a subscription, the numbers stop being a maze and start being a function you can control.
Jump to a section
Pass this article along
Send it to your preferred platform or copy the link.
Before you move on
Next step
Finished reading? Continue comparing tools in the directory.
Browse tools