The AI model landscape is changing fast in 2025. New models like GPT‑4.1, Claude 3.7 Sonnet, Gemini 2.5 Pro, and Grok‑3 are offering bigger context windows and better performance. But pricing still matters. Whether you’re building a chatbot, analyzing documents, or training workflows, knowing the cost per million tokens (MTok) helps you choose the right model.

This guide compares the latest LLMs based on API pricing for input and output tokens. It also shows their context window size—how much text you can send and receive in one go.
Pricing Summary Table (April 2025)
Below is a simplified comparison of LLM pricing across major providers. All prices are per 1 million tokens (MTok) for API usage. Models with non-token pricing or image/audio-only use cases are excluded.
| Model | Input Price (per 1M tokens) | Output Price (per 1M tokens) | Context Length |
|---|---|---|---|
| Alibaba Qwen-Plus-0125 | $0.40 | $1.20 | 131k |
| Alibaba Qwen2.5-Max | $1.60 | $6.40 | 32k |
| Claude 3.5 Haiku | $0.80 | $4.00 | 200k |
| Claude 3.7 Sonnet | $3.00 | $15.00 | 200k |
| Claude 3 Opus | $15.00 | $75.00 | 200k |
| Cohere Command A | $2.50 | $10.00 | 256k |
| DeepSeek V3 | $0.27 | $1.10 | 64k |
| DeepSeek R1 | $0.55 | $2.19 | 64k |
| Gemini 1.5 Flash | n/a | n/a | 1M |
| Gemini 1.5 Pro | n/a | n/a | 2M |
| Gemini 2.0 Flash | $0.10 | $0.40 | 1M |
| Gemini 2.0 Flash-Lite | $0.075 | $0.30 | Not Specified |
| Gemini 2.5 Pro Preview | $2.50 | $15.00 | 1M |
| gpt-4.1 | $2.00 | $8.00 | 1M |
| gpt-4.1-mini | $0.40 | $1.60 | 128k |
| gpt-4.1-nano | $0.10 | $0.40 | 128k |
| gpt-4.5-preview | $75.00 | $150.00 | 128k |
| gpt-4o | $2.50 | $10.00 | 128k |
| gpt-4o-mini | $0.15 | $0.60 | 128k |
| gpt-4o-mini-realtime | $0.60 | $2.40 | 128k |
| gpt-4o-realtime | $5.00 | $20.00 | 128k |
| grok-3-beta | $3.00 | $15.00 | 131k |
| grok-3-mini-beta | $0.30 | $0.50 | 131k |
| grok-3-mini-fast-beta | $0.60 | $4.00 | 131k |
| o1-mini | $1.10 | $4.40 | 200k |
| o1-pro | $150.00 | $600.00 | 200k |
Note: Prices reflect text input/output only. Cached inputs, audio/image processing, and batch discounts are not included.
Longest Context Models
Context length tells you how much text a model can handle in one go. Bigger context helps with tasks like summarizing long reports, coding across files, or multi-turn instructions.
Gemini 1.5 Pro – 2M Tokens
This model leads with a massive 2 million token context. That’s enough for 8+ books or 50,000+ lines of code. Pricing is not available yet.
Gemini 1.5 Flash, Gemini 2.0 Flash, Gemini 2.5 Pro, GPT-4.1 – 1M Tokens
These models support 1 million token windows.
- Gemini 2.0 Flash is the most affordable at $0.10 input / $0.40 output.
- GPT-4.1 is more powerful but higher priced at $2.00 input / $8.00 output.
- Gemini 2.5 Pro is in preview, priced at $2.50 input / $15.00 output.
Others
- Claude 3 and o1/o3 series offer up to 200k tokens.
- Cohere Command A offers 256k, good for long documents and multi-turn workflows.
For serious long-text processing, Gemini 1.5 Pro is unmatched (if pricing is not a concern). GPT-4.1 is a solid paid option with broad support.
Lowest Cost Models
If you’re running high-volume tasks or working on a budget, these models offer the best pricing per million tokens.
Gemini 2.0 Flash-Lite – $0.075 input / $0.30 output
The cheapest on the list. Great for fast, lightweight tasks like summaries, replies, and basic Q&A. Context length not specified, but likely under 1M.
GPT-4.1 Nano – $0.10 input / $0.40 output
Ideal for small apps, assistant bots, or prompt chaining. Fast and efficient with a 128k context window.
Gemini 2.0 Flash – $0.10 input / $0.40 output
Same cost as GPT-4.1 Nano but offers a 1M context window—useful for summarizing large docs or long conversations.
Grok-3 Mini Beta – $0.30 input / $0.50 output
One of the most cost-effective from xAI’s Grok family. With 131k context, it balances price and capacity well.
These models are perfect for startups, content tools, and automation pipelines that process lots of small queries.
High-End Premium Models
Some models come with high pricing, designed for advanced or specialized tasks. These are not meant for casual or lightweight use.
o1-pro – $150.00 input / $600.00 output
This is the most expensive model in the list. It’s built for complex enterprise-level use cases, such as advanced reasoning or specialized domain tasks. It supports up to 200k tokens in context.
gpt-4.5-preview – $75.00 input / $150.00 output
Another high-cost option from OpenAI. Though limited to 128k context, it’s likely focused on cutting-edge capabilities or evaluation access.
These models are meant for users who need maximum performance, accuracy, or experimentation with top-tier AI capabilities. For most users, they are overkill unless the project specifically demands it.
Best Value Models
Some models offer a good balance between price, performance, and context length. These are practical choices for most developers and businesses.
gpt-4.1 – $2.00 input / $8.00 output – 1M context
Offers high performance with a large context window. Suitable for long conversations, content generation, and multi-step workflows.
Claude 3.5 Haiku – $0.80 input / $4.00 output – 200k context
From Anthropic, this model provides solid reasoning at a moderate cost. Good for productivity apps and writing tools.
deepseek-chat (DeepSeek-V3) – $0.27 input / $1.10 output – 64k context
An affordable and capable model for casual and semi-technical tasks. Ideal for chatbot services and API automation.
gpt-4o-mini – $0.15 input / $0.60 output – 128k context
Cost-effective and fast. Good for real-time interactions, summaries, or customer support bots.
grok-3-mini-beta – $0.30 input / $0.50 output – 131k context
Balanced pricing and speed from xAI. Can handle moderate-length input with decent performance.
These models serve most practical use cases well without overspending. Ideal for developers who want strong output without premium costs.
Provider-wise Summary
Here’s a quick look at what each major provider offers, based on pricing, context length, and range of models.
OpenAI
- Offers a wide range—from low-cost options like gpt-4.1-nano ($0.10 input) to premium models like o1-pro ($150 input).
- gpt-4.1 stands out for its 1M context at a mid-range price.
- gpt-4o-mini is a good pick for low-latency tasks.
xAI (Grok)
- Provides flexible options with 131k context length.
- grok-3-mini-beta is very affordable ($0.30 input / $0.50 output), while grok-3-fast-beta ($5.00 input / $25.00 output) is for heavier tasks.
Google (Gemini)
- Leading in context size with Gemini 1.5 Pro (2M tokens) and 1.5 Flash (1M).
- Gemini 2.0 Flash and Flash-Lite are among the lowest priced.
- Good balance of scale and affordability, but pricing is not available for all models.
Anthropic (Claude)
- Focuses on reasoning and thoughtful output.
- Claude 3.5 Haiku is affordable ($0.80 input), while Claude 3 Opus is expensive but powerful.
- All Claude models offer 200k context.
DeepSeek
- Budget-friendly with solid performance.
- deepseek-chat (V3) and deepseek-reasoner (R1) are good choices for developers needing fast, low-cost results with 64k context.
Cohere
- Offers a single known model, Command A, with a 256k context window.
- Priced at $2.50 input / $10 output, suitable for document-heavy tasks.
Alibaba Qwen
- Underrated but very affordable.
- Qwen-Plus-0125 is just $0.40 input / $1.20 output with 131k context.
- Qwen2.5-Max is more expensive but has a 32k context limit.
Each provider has a different strength—whether it’s cost, speed, or long-context processing. Choosing depends on your specific use case and budget.
Things Not Included
This comparison focuses only on basic input/output token pricing for text-based API calls. Several other factors are not covered in this list:
- Cache pricing is excluded. Some models offer reduced pricing for repeated prompts (e.g., Claude 3.7 Sonnet has a $0.30 cached prompt read).
- Batch API discounts are not listed. For example, Claude and OpenAI offer up to 50% off in batch mode.
- Audio, image, and multi-modal pricing is excluded. This includes models like grok-2-image and Gemini audio input.
- Free and open-source models such as LLaMA or Falcon are not included here. These have no token cost but require server hosting.
- Rate limits and free tiers are not considered. Some models are free under preview (e.g., Gemini 2.5 Pro Preview) but have usage caps.
If you are working with images, audio, or cached prompts, check each provider’s full documentation for detailed pricing.
Closing Thoughts
Choosing the right LLM in 2025 depends on what you need—speed, price, or context length. If your task involves long documents or code, models like GPT-4.1 or Gemini 2.0 Flash offer 1M context at reasonable rates. For cost-focused work, gpt-4.1-nano or Gemini Flash-Lite provide great value.
Premium models like Claude 3 Opus and o1-pro are powerful but expensive, best for advanced enterprise use. Meanwhile, middle-tier models like Claude 3.5 Haiku, gpt-4o-mini, and deepseek-chat balance output quality with affordability.
This comparison helps narrow down your options, but pricing and availability may change. For latest details, always check official sources like OpenAI, Google AI Studio, xAI, Anthropic, and DeepSeek.