Cheapest AI APIs in 2026: Complete Provider Ranking & Full Pricing Breakdown
Compare 50+ AI API prices from OpenAI, Claude, Gemini, DeepSeek, Mistral & more. Find the lowest cost per token, free tiers, batch discounts, and hidden fees.
Budget AI API Comparison: Find the Lowest Cost Per Token in 2026
DeepSeek dominates budget AI with input prices 90% lower than competitors, while Gemini 2.5 Flash-Lite and GPT-4o-mini offer the best value for mainstream applications. This comprehensive analysis covers 50+ models from 20+ providers, with exact per-million-token pricing, hidden costs, and optimization strategies to slash your AI spend by up to 90%.
The AI API market has undergone dramatic price compression since late 2025. DeepSeek's unified pricing model now delivers flagship-quality reasoning at $0.28/M tokens input (with cache hits dropping to $0.028). Google's Gemini 2.5 Flash-Lite disrupts further at $0.10/M input, while open-source hosts like DeepInfra serve Llama 3.1 8B at just $0.02/M tokens. For developers building cost-sensitive applications, these price points represent a fundamental shift—what cost $100 in 2024 now costs under $5.
OpenAI API Pricing (January 2026)
OpenAI's pricing structure now spans five generations of models. The GPT-4o family remains the production workhorse, with GPT-4o-mini offering exceptional value at $0.15/$0.60 per million tokens (input/output).
GPT-4o Family
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Cached Input | Context |
|---|---|---|---|---|
| GPT-4o | $2.50 | $10.00 | $1.25 | 128K |
| GPT-4o-mini | $0.15 | $0.60 | $0.075 | 128K |
| GPT-4.1 | $2.00 | $8.00 | $0.50 | 1M |
| GPT-4.1-mini | $0.40 | $1.60 | $0.10 | 1M |
| GPT-4.1-nano | $0.10 | $0.40 | $0.025 | 1M |
| GPT-4 Turbo | $10.00 | $30.00 | — | 128K |
O-Series Reasoning Models
The o-series reasoning models command premium pricing due to internal "thinking tokens" that inflate actual costs beyond listed rates:
| Model | Input (per 1M) | Output (per 1M) | Cached Input | Notes |
|---|---|---|---|---|
| o1 | $15.00 | $60.00 | $7.50 | Full reasoning |
| o1-mini | $1.10 | $4.40 | $0.55 | Budget reasoning |
| o3 | $2.00 | $8.00 | $0.50 | 80% price cut (June 2025) |
| o3-mini | $1.10 | $4.40 | $0.55 | Most cost-effective reasoning |
| o4-mini | $1.10 | $4.40 | $0.55 | Latest mini reasoning |
OpenAI Embedding Models
| Embedding Model | Standard | Batch (50% off) |
|---|---|---|
| text-embedding-3-small | $0.02 | $0.01 |
| text-embedding-3-large | $0.13 | $0.065 |
OpenAI Audio & Image Pricing
| Service | Price |
|---|---|
| Whisper transcription | $0.006/minute |
| TTS Standard | $15/M characters |
| DALL-E 3 (1024×1024 HD) | $0.08/image |
| DALL-E 2 (1024×1024) | $0.02/image |
Anthropic Claude API Pricing (January 2026)
Anthropic's Claude 4.5 release (November 2025) brought transformative 67% price reductions on Opus-tier models while improving capability. Claude Haiku 3 remains the budget champion at $0.25/$1.25 input/output.
Claude Model Pricing
| Model | Input (per 1M) | Output (per 1M) | Notes |
|---|---|---|---|
| Claude Opus 4.5 | $5.00 | $25.00 | 67% cheaper than Opus 4.1 |
| Claude Sonnet 4.5 | $3.00 | $15.00 | 1M context support |
| Claude Haiku 4.5 | $1.00 | $5.00 | Fastest, cost-efficient |
| Claude Haiku 3.5 | $0.80 | $4.00 | Popular mid-tier |
| Claude Haiku 3 | $0.25 | $1.25 | Budget champion |
Claude Prompt Caching Discounts
Cache reads cost only 10% of base input price:
| Model | Standard Input | Cache Write (5m) | Cache Read (90% savings) |
|---|---|---|---|
| Sonnet 4.5 | $3.00 | $3.75 | $0.30 |
| Haiku 3.5 | $0.80 | $1.00 | $0.08 |
| Haiku 3 | $0.25 | $0.30 | $0.025 |
Batch API offers 50% discount across all models, making Claude Haiku 3 batch pricing drop to just $0.125/M input.
Google Gemini API Pricing (January 2026)
Google's pricing strategy centers on aggressive free tiers and the ultra-competitive Gemini 2.5 Flash-Lite at $0.10/$0.40 per million tokens.
Gemini Model Pricing
| Model | Free Tier | Input (per 1M) | Output (per 1M) |
|---|---|---|---|
| Gemini 2.5 Flash-Lite | ✓ | $0.10 | $0.40 |
| Gemini 2.5 Flash | ✓ | $0.30 | $2.50 |
| Gemini 2.5 Pro (≤200K) | ✓ | $1.25 | $10.00 |
| Gemini 2.5 Pro (>200K) | ✓ | $2.50 | $15.00 |
| Gemini 2.0 Flash | ✓ | $0.10 | $0.40 |
| Gemini 2.0 Flash-Lite | ✓ | $0.075 | $0.30 |
| Gemini 3 Flash (preview) | ✓ | $0.50 | $3.00 |
| Gemini 3 Pro (preview) | No | $2.00 | $12.00 |
Gemini Free Tier Limits
- Gemini 2.5 Flash-Lite: 1,000 requests per day
- Gemini 2.5 Pro: 50 requests per day
- All Flash models: Generous free limits for prototyping
Context caching saves 75-90% on input costs—Gemini 2.5 Pro cached input drops from $1.25 to $0.125/M tokens.
DeepSeek API Pricing: The Budget Disruptor
DeepSeek's unified pricing (late 2025) offers the lowest flagship-quality pricing in the market. The automatic disk-based caching system delivers 90% savings on repeated prompts.
| Model | Cache Hit Input | Cache Miss Input | Output |
|---|---|---|---|
| DeepSeek-V3.2 (chat) | $0.028 | $0.28 | $0.42 |
| DeepSeek-V3.2 (reasoner) | $0.028 | $0.28 | $0.42 |
DeepSeek Cache Benefits
- Automatic caching with 64-token minimum granularity
- 50-60% average cache hit rates (80% for repetitive workloads)
- 128K prompts: first-token latency drops from ~13 seconds to ~500ms
- No rate limits: designed for up to 1 trillion tokens per day
Mistral AI API Pricing (January 2026)
Mistral's Large 3 (December 2025) slashed flagship pricing by 75% to $0.50/$1.50, while Mistral Small 3.1 offers exceptional value at $0.03/$0.11.
| Model | Input (per 1M) | Output (per 1M) | Context |
|---|---|---|---|
| Mistral Large 3 | $0.50 | $1.50 | 262K |
| Mistral Medium 3.1 | $0.40 | $2.00 | 131K |
| Mistral Small 3.1 | $0.03 | $0.11 | 131K |
| Mistral Nemo | $0.02 | $0.04 | 131K |
| Codestral | $0.30 | $0.90 | 256K |
| Ministral 8B | $0.10 | $0.10 | 131K |
| Ministral 3B | $0.04 | $0.04 | 131K |
| Mistral Embed | $0.01 | — | — |
Batch API delivers 50% discount across all models.
Open-Source Model API Providers Compared
These providers host Llama, Mixtral, and other open-weights models with dramatically different pricing and speed characteristics.
Llama Model Pricing by Provider (per 1M tokens)
| Provider | Llama 3.1 8B | Llama 3.3 70B | Llama 4 Scout | Speed (TPS) |
|---|---|---|---|---|
| DeepInfra | $0.02/$0.03 | $0.10/$0.32 | $0.08/$0.30 | Fast |
| Groq | $0.05/$0.08 | $0.59/$0.79 | $0.11/$0.34 | 840 TPS |
| Together.ai | $0.18/$0.18 | $0.88/$0.88 | $0.18/$0.59 | Standard |
| Fireworks | $0.10-0.20 | $0.90/$0.90 | ~$0.20 | 300 TPS |
DeepInfra consistently delivers the lowest per-token pricing—Llama 3.1 8B Turbo at just $0.02/M input is 90% cheaper than Together.ai.
Groq leads in speed, achieving 275-840 tokens per second via their LPU architecture, with deterministic latency.
GPU Cloud Pricing for Self-Hosting (per hour)
| Provider | H100 80GB | A100 80GB | Notes |
|---|---|---|---|
| DeepInfra | $1.69 | $0.89 | Cheapest |
| RunPod | $2.69+ | $1.19+ | Community cloud |
| Modal | $3.95 | $2.50 | Serverless, $30/mo free |
| Together.ai | $3.36 | $2.56 | Dedicated endpoints |
| Fireworks | $4.00 | $2.90 | On-demand |
Cloud Provider AI Services
AWS Bedrock Pricing Analysis
Bedrock generally matches or adds 5-10% premium versus direct API access:
| Model | Bedrock Price | Direct API | Markup |
|---|---|---|---|
| Claude 3 Haiku | $0.25/$1.25 | $0.25/$1.25 | None |
| Claude 3.5 Sonnet v2 | $6.00/$30.00 | $3.00/$15.00 | ~100% |
| Llama 3.3 70B | $0.72/$0.72 | ~$0.59/$0.79 | ~10% |
| DeepSeek-R1 | $1.35/$5.40 | $0.28/$0.42 | ~380% |
Hidden costs: Data transfer ($0.09-0.12/GB), guardrails ($0.10-0.17/1K text units), knowledge bases ($0.001/rerank query).
Azure OpenAI Pricing
Azure matches OpenAI's direct pricing for standard models. PTU (Provisioned Throughput Units) commitments can save 50-70% for high-volume users:
| Commitment | Savings | Minimum |
|---|---|---|
| 1-month | ~50% | ~$2,448/month per PTU |
| Annual | Up to 70% | Higher commitment |
| Fine-tuned hosting | — | ~$1,836-2,160/month minimum |
Budget Tier Analysis
Tier 1: Ultra-Budget (Under $0.10/M Input)
| Model | Input | Output | Best For |
|---|---|---|---|
| DeepSeek V3.2 (cache hit) | $0.028 | $0.42 | High-volume chat |
| Llama 3.1 8B (DeepInfra) | $0.02 | $0.03 | Simple tasks |
| Gemini 2.0 Flash-Lite | $0.075 | $0.30 | Multimodal |
| Mistral Nemo | $0.02 | $0.04 | Lightweight inference |
| Mistral Small 3.1 | $0.03 | $0.11 | General-purpose |
| Ministral 3B | $0.04 | $0.04 | Edge deployment |
Winner: DeepSeek V3.2 with cache hits at $0.028/M delivers flagship reasoning quality at 99% lower cost than GPT-4o.
Tier 2: Budget Value ($0.10-$1.00/M Input)
| Model | Input | Output | Best For |
|---|---|---|---|
| GPT-4o-mini | $0.15 | $0.60 | Balanced quality/cost |
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 | Multimodal, long context |
| Claude Haiku 3 | $0.25 | $1.25 | Fast responses |
| DeepSeek V3.2 (no cache) | $0.28 | $0.42 | Reasoning tasks |
| Mistral Large 3 | $0.50 | $1.50 | Complex tasks |
| Claude Haiku 3.5 | $0.80 | $4.00 | Production chat |
Winner: GPT-4o-mini at $0.15/M offers the best combination of capability, ecosystem integration, and developer experience.
Tier 3: Mid-Range ($1-5/M Input)
| Model | Input | Output | Best For |
|---|---|---|---|
| o3-mini | $1.10 | $4.40 | Reasoning with code |
| Gemini 2.5 Pro | $1.25 | $10.00 | Long context (1M tokens) |
| GPT-4.1 | $2.00 | $8.00 | 1M context window |
| GPT-4o | $2.50 | $10.00 | Multimodal flagship |
| Claude Sonnet 4.5 | $3.00 | $15.00 | Creative, long-form |
Tier 4: Premium ($5+/M Input)
| Model | Input | Output | Best For |
|---|---|---|---|
| Claude Opus 4.5 | $5.00 | $25.00 | Complex analysis |
| o1 | $15.00 | $60.00 | Advanced reasoning |
| o3 | $2.00 | $8.00 | Agentic workflows |
| o1-pro | $150.00 | $600.00 | Highest capability |
Free Tier Comparison for Startups
| Provider | Free Credits | Duration | Limits | Production Use |
|---|---|---|---|---|
| Google AI Studio | Unlimited (free tier) | Ongoing | 50-1,000 RPD | ✓ with attribution |
| OpenAI | $5 | 3 months | All models | ✓ |
| Anthropic | $5 | No expiry | Rate-limited | ✓ |
| Cohere | 1,000 calls/mo | Ongoing | 20 RPM | ✗ Trial only |
| AI21 Labs | $10 | 3 months | All APIs | ✓ |
| Voyage AI | 200M tokens | Ongoing | Per account | ✓ |
| Modal | $30/month | Ongoing | Compute credits | ✓ |
| Groq | Free tier | Ongoing | Rate-limited | ✓ |
| DeepSeek | Chat only | Ongoing | No API access | ✗ |
Best for startups: Google AI Studio's generous free tier with Gemini 2.5 Flash-Lite (1,000 RPD) enables substantial prototyping before any spend.
Hidden Costs and Gotchas
Rate Limits That Restrict Scaling
| Provider | Free Tier | Paid Tier 1 | Enterprise |
|---|---|---|---|
| OpenAI | 500 RPM, 200K TPM | 2,000 RPM, 2M TPM | 20,000+ RPM |
| Anthropic | Rate-limited | 4,000 RPM | Custom |
| 50-1,000 RPD | Higher | Custom | |
| DeepSeek | No fixed limits | No fixed limits | Same |
Minimum Charges and Commitments
| Service | Minimum Cost |
|---|---|
| Azure OpenAI PTU | ~$2,448/month per PTU |
| AWS Bedrock Provisioned | Starting $44/hour ($31K+/month) |
| Fine-tuned hosting (Azure) | ~$1,836-2,160/month |
| Modal Team plan | $250/month (includes $100 credits) |
Token Counting Gotchas
- O-series reasoning tokens: Internal thinking billed as output but hidden—actual costs often 2-5x estimates
- Claude extended thinking: Billed as output tokens at standard rates
- Tool use overhead: Adds 245-700 tokens per tool call depending on model
Cost Optimization Strategies
Batch API Savings (50% Discount)
| Provider | Batch Discount | Example Savings (1M tokens) |
|---|---|---|
| OpenAI | 50% | GPT-4o: $1.25 input (vs $2.50) |
| Anthropic | 50% | Haiku 3: $0.125 input (vs $0.25) |
| 50% | Gemini 2.5 Pro: $0.625 (vs $1.25) | |
| Mistral | 50% | All models |
| Groq | 50% | 24-hour to 7-day window |
Prompt Caching Delivers 75-90% Savings
| Provider | Cache Discount | Cache Duration |
|---|---|---|
| DeepSeek | 90% | Automatic disk-based |
| Anthropic | 90% | 5 minutes or 1 hour |
| 75-90% | Variable | |
| OpenAI (GPT-5) | 90% | 5-10 minutes |
| OpenAI (GPT-4o) | 50% | 5-10 minutes |
Optimal strategy: For repetitive system prompts, Anthropic's 1-hour cache breaks even after just 3 cache reads.
Real-World Cost Calculations
Cost Per 1 Million Tokens Processed
| Model | Standard | With Batch | With Cache + Batch |
|---|---|---|---|
| GPT-4o (in+out) | $12.50 | $6.25 | $3.75 |
| Claude Haiku 3 | $1.50 | $0.75 | $0.15 |
| Gemini 2.5 Flash-Lite | $0.50 | $0.25 | $0.075 |
| DeepSeek V3.2 | $0.70 | — | $0.07 |
Monthly Cost for Typical Workloads
Chatbot (10,000 conversations/month, ~2K tokens each = 20M tokens):
| Model | Monthly Cost |
|---|---|
| DeepSeek V3.2 (cached) | $1.40 |
| GPT-4o-mini | $15 |
| Claude Haiku 3 | $30 |
| GPT-4o | $250 |
Code Generation (100K requests/month, 5K tokens avg):
| Model | Monthly Cost |
|---|---|
| DeepSeek V3.2 | $210 |
| GPT-4o-mini | $375 |
| Codestral | $600 |
Speed vs Cost Tradeoffs
| Provider/Model | Speed (TPS) | Cost ($/M output) | Latency |
|---|---|---|---|
| Groq (Llama 3.1 8B) | 840 | $0.08 | Lowest |
| Groq (Llama 3.3 70B) | 394 | $0.79 | Low |
| Fireworks (Mixtral) | 300 | $0.50 | Low |
| OpenAI (GPT-4o) | ~100 | $10.00 | Medium |
| Anthropic (Haiku 3) | ~150 | $1.25 | Medium |
| DeepSeek (V3.2) | Variable | $0.42 | Variable |
When to pay for speed:
- Real-time chat interfaces: Groq's sub-second TTFT justifies premium
- Batch processing: Speed irrelevant—optimize purely on cost
- Agentic workflows: Latency compounds across tool calls
Provider Rankings by Use Case
Cheapest Overall (Per-Token Cost)
- DeepSeek V3.2 (cache hit): $0.028/$0.42
- Llama 3.1 8B (DeepInfra): $0.02/$0.03
- Mistral Nemo: $0.02/$0.04
- Gemini 2.0 Flash-Lite: $0.075/$0.30
- GPT-4o-mini: $0.15/$0.60
Best for Startups (Free Tier + Value)
- Google AI Studio (Gemini 2.5 Flash-Lite): Generous free, low paid
- OpenAI (GPT-4o-mini): $5 credit, excellent docs
- Groq: Free tier with blazing speed
- Modal: $30/month free compute
Best for Enterprise (SLAs + Compliance)
- Azure OpenAI: Full Microsoft enterprise stack
- AWS Bedrock: Wide model selection, AWS integration
- Google Vertex AI: Enterprise features, data residency
- Anthropic (direct): Tier 4 for production SLAs
Best for Coding
- Codestral: $0.30/$0.90, 256K context
- DeepSeek V3.2: Unified pricing, excellent code quality
- GPT-4o-mini: Strong coding at $0.15/$0.60
- Claude Sonnet 4.5: Superior for complex refactoring
Best for Embeddings
- Mistral Embed: $0.01/M tokens
- text-embedding-3-small: $0.02/M (batch: $0.01)
- Voyage AI voyage-3.5-lite: $0.02/M + 200M free
- Google Embedding: $0.15/M (free tier available)
Best for Vision/Multimodal
- Gemini 2.5 Flash: $0.30 input, native multimodal
- GPT-4o: $2.50 input, excellent vision
- Claude Sonnet 4.5: $3.00 input, document understanding
- Pixtral 12B (Mistral): $0.10/$0.10 budget vision
Conclusion: Strategic Pricing Recommendations
The 2026 AI API landscape rewards strategic optimization. DeepSeek's automatic caching delivers the lowest effective costs for high-volume applications—$0.028/M tokens is transformational for chatbots and support systems. For mainstream production workloads, GPT-4o-mini ($0.15/$0.60) and Gemini 2.5 Flash-Lite ($0.10/$0.40) offer the optimal balance of capability, reliability, and cost.
Three strategies consistently reduce costs by 50-90%:
- Batch APIs for non-real-time workloads (50% savings)
- Prompt caching for repetitive system prompts (75-90% savings)
- Tiered model selection—route simple queries to Haiku/mini models
For startups, Google AI Studio's free tier combined with DeepInfra's $0.02/M Llama pricing enables building significant AI products before meaningful spend. Enterprises should evaluate Azure OpenAI's PTU commitments (up to 70% savings) for predictable high-volume needs.
The key insight: the cheapest API isn't always the best value. Match your provider to your actual requirements—latency sensitivity, compliance needs, model diversity—rather than optimizing purely on per-token cost.
Tags:
Ready to Save on AI Costs?
Use our free calculator to compare all 8 AI providers and find the cheapest option for your needs
Compare All Providers →Found this helpful? Share it: