AI API pricing spans three orders of magnitude — from $0.01 per million tokens to over $75 per million tokens. DeepSeek V3 and Gemini 2.0 Flash Lite offer the best price-performance ratio; GPT-4o mini and Claude 3.5 Haiku are the best value from major US providers.
Understanding AI API Pricing
AI APIs charge separately for input tokens (the text you send) and output tokens (the text the model generates). Output tokens typically cost 3–5x more than input tokens because they require more computation. The pricing structure means that tasks requiring long responses are relatively more expensive than tasks requiring long inputs.
Full Pricing Matrix (April 2026)
| Model | Provider | Input ($/M tokens) | Output ($/M tokens) | Value Score* |
|---|---|---|---|---|
| Gemini 2.0 Flash Lite | $0.075 | $0.30 | ⭐⭐⭐⭐⭐ | |
| DeepSeek V3 | DeepSeek AI | $0.27 | $1.10 | ⭐⭐⭐⭐⭐ |
| Gemini 2.0 Flash | $0.10 | $0.40 | ⭐⭐⭐⭐⭐ | |
| GPT-4o mini | OpenAI | $0.15 | $0.60 | ⭐⭐⭐⭐ |
| Claude 3.5 Haiku | Anthropic | $0.80 | $4.00 | ⭐⭐⭐⭐ |
| Mistral Small | Mistral AI | $0.20 | $0.60 | ⭐⭐⭐⭐ |
| Llama 3.3 70B (Together.ai) | Meta / Together | $0.60 | $0.90 | ⭐⭐⭐⭐ |
| DeepSeek R1 | DeepSeek AI | $0.55 | $2.19 | ⭐⭐⭐⭐ (reasoning) |
| Mistral Large 2 | Mistral AI | $2.00 | $6.00 | ⭐⭐⭐ |
| GPT-4o | OpenAI | $2.50 | $10.00 | ⭐⭐⭐ |
| Claude 3.5 Sonnet | Anthropic | $3.00 | $15.00 | ⭐⭐⭐ |
| Llama 4 Maverick (Together.ai) | Meta / Together | $0.27 | $0.85 | ⭐⭐⭐⭐ |
| Gemini 2.0 Pro | $10.00 | $30.00 | ⭐⭐ (specialty only) | |
| Claude 4 Opus | Anthropic | $15.00 | $75.00 | ⭐⭐ (specialty only) |
| o3 | OpenAI | $10.00 | $40.00 | ⭐⭐ (reasoning specialty) |
| o4-mini | OpenAI | $1.10 | $4.40 | ⭐⭐⭐⭐ (reasoning) |
| GPT-5 | OpenAI | $7.50 | $30.00 | ⭐⭐⭐ (frontier) |
*Value Score = capability relative to cost. ⭐⭐⭐⭐⭐ = exceptional value; ⭐ = expensive relative to capability.
The Best Value Models by Use Case
Best for High-Volume Text Processing
Gemini 2.0 Flash Lite ($0.075/M input) is the cheapest capable model available. At this price, processing 1 billion tokens costs $75 — affordable for large-scale applications. Quality is below frontier models but sufficient for categorization, extraction, and simple generation tasks.
Best for Frontier Quality at Low Cost
DeepSeek V3 ($0.27/M input) achieves near-GPT-4o performance at one-ninth the price. For applications needing high quality without requiring the absolute frontier, DeepSeek V3 offers the best quality-to-cost ratio. Caveat: Chinese origin and data sovereignty considerations apply.
Best Value from a US Provider
GPT-4o mini ($0.15/M input) or Claude 3.5 Haiku ($0.80/M input). GPT-4o mini is cheaper but slightly lower quality. Claude Haiku is 5x more expensive than GPT-4o mini but maintains Anthropic's safety and reliability characteristics.
Best for Reasoning Tasks
o4-mini ($1.10/M input) offers near-o3 reasoning capability at 10x lower cost. For math, logic, and complex coding tasks where you need reasoning model quality but can't justify o3 pricing, o4-mini is the best option.
Practical Cost Calculations
To ground these numbers: processing a typical 1,000-word document produces approximately 1,300 input tokens and 300 output tokens:
| Model | Cost per 1,000-word query | Cost per 1M queries |
|---|---|---|
| Gemini 2.0 Flash Lite | $0.000188 | $188 |
| DeepSeek V3 | $0.000681 | $681 |
| GPT-4o mini | $0.000375 | $375 |
| GPT-4o | $0.00625 | $6,250 |
| Claude 3.5 Sonnet | $0.00840 | $8,400 |
| Claude 4 Opus | $0.0420 | $42,000 |
Pricing Gotchas to Know
- Caching discounts: OpenAI and Anthropic offer prompt caching — if you send the same system prompt repeatedly, cached tokens cost 50–90% less. At scale, this significantly reduces costs for applications with fixed system prompts.
- Batch API discounts: Some providers offer 50% discounts for batch (non-real-time) processing. If you can tolerate 24-hour turnaround, batch is substantially cheaper.
- Thinking tokens: Reasoning models generate thinking tokens that also cost money, even if not shown to users. o3 and DeepSeek R1 can generate many thinking tokens on hard problems — actual cost may be much higher than list price suggests.
- Different models for different tasks: Production applications often use cheap models for simple tasks and expensive models for complex ones. This "model tiering" can reduce overall costs by 60–80%.
Frequently Asked Questions
What's the cheapest AI API that's still good?
Gemini 2.0 Flash and GPT-4o mini offer the best quality-to-cost ratio among US providers. DeepSeek V3 offers near-frontier quality at $0.27/M tokens if you're comfortable with its Chinese origin.
Is paying for a subscription better than API access?
Depends on volume. ChatGPT Plus ($20/month) and Claude Pro ($20/month) are consumer subscriptions with generous quotas. If you're processing large volumes of text programmatically, API pricing becomes more economical above a few hundred queries per day.
Do prices change frequently?
AI API prices have generally decreased over time — sometimes dramatically. GPT-4-class capability that cost $60/M tokens in 2023 costs $2.50/M in 2025. Prices for established models tend to decrease; new frontier models launch at premium prices.
How can I estimate my monthly API costs?
Multiply your estimated monthly token volume (input + output) by the per-million-token cost. Add 20% buffer for variance. Most providers offer usage dashboards and alerts to help track costs in real time.