The AI model landscape is changing fast in 2025. New models like GPT‑4.1, Claude 3.7 Sonnet, Gemini 2.5 Pro, and Grok‑3 are offering bigger context windows and better performance. But pricing still matters. Whether you’re building a chatbot, analyzing documents, or training workflows, knowing the cost per million tokens (MTok) helps you choose the right model.

Cosine Similarity & Semantic Density

Use this widget to compare an article against a target topic and visualize sentence-level relevance. Set the API base via the api attribute (e.g. https:\/\/your-deploy.vercel.app). If empty, it will default to the current site origin.
LLM Pricing Comparison April 2025

This guide compares the latest LLMs based on API pricing for input and output tokens. It also shows their context window size—how much text you can send and receive in one go.

Pricing Summary Table (April 2025)

Below is a simplified comparison of LLM pricing across major providers. All prices are per 1 million tokens (MTok) for API usage. Models with non-token pricing or image/audio-only use cases are excluded.

ModelInput Price (per 1M tokens)Output Price (per 1M tokens)Context Length
Alibaba Qwen-Plus-0125$0.40$1.20131k
Alibaba Qwen2.5-Max$1.60$6.4032k
Claude 3.5 Haiku$0.80$4.00200k
Claude 3.7 Sonnet$3.00$15.00200k
Claude 3 Opus$15.00$75.00200k
Cohere Command A$2.50$10.00256k
DeepSeek V3$0.27$1.1064k
DeepSeek R1$0.55$2.1964k
Gemini 1.5 Flashn/an/a1M
Gemini 1.5 Pron/an/a2M
Gemini 2.0 Flash$0.10$0.401M
Gemini 2.0 Flash-Lite$0.075$0.30Not Specified
Gemini 2.5 Pro Preview$2.50$15.001M
gpt-4.1$2.00$8.001M
gpt-4.1-mini$0.40$1.60128k
gpt-4.1-nano$0.10$0.40128k
gpt-4.5-preview$75.00$150.00128k
gpt-4o$2.50$10.00128k
gpt-4o-mini$0.15$0.60128k
gpt-4o-mini-realtime$0.60$2.40128k
gpt-4o-realtime$5.00$20.00128k
grok-3-beta$3.00$15.00131k
grok-3-mini-beta$0.30$0.50131k
grok-3-mini-fast-beta$0.60$4.00131k
o1-mini$1.10$4.40200k
o1-pro$150.00$600.00200k

Note: Prices reflect text input/output only. Cached inputs, audio/image processing, and batch discounts are not included.

Longest Context Models

Context length tells you how much text a model can handle in one go. Bigger context helps with tasks like summarizing long reports, coding across files, or multi-turn instructions.

Gemini 1.5 Pro – 2M Tokens

This model leads with a massive 2 million token context. That’s enough for 8+ books or 50,000+ lines of code. Pricing is not available yet.

Gemini 1.5 Flash, Gemini 2.0 Flash, Gemini 2.5 Pro, GPT-4.1 – 1M Tokens

These models support 1 million token windows.

  • Gemini 2.0 Flash is the most affordable at $0.10 input / $0.40 output.
  • GPT-4.1 is more powerful but higher priced at $2.00 input / $8.00 output.
  • Gemini 2.5 Pro is in preview, priced at $2.50 input / $15.00 output.

Others

  • Claude 3 and o1/o3 series offer up to 200k tokens.
  • Cohere Command A offers 256k, good for long documents and multi-turn workflows.

For serious long-text processing, Gemini 1.5 Pro is unmatched (if pricing is not a concern). GPT-4.1 is a solid paid option with broad support.

Lowest Cost Models

If you’re running high-volume tasks or working on a budget, these models offer the best pricing per million tokens.

Gemini 2.0 Flash-Lite – $0.075 input / $0.30 output

The cheapest on the list. Great for fast, lightweight tasks like summaries, replies, and basic Q&A. Context length not specified, but likely under 1M.

GPT-4.1 Nano – $0.10 input / $0.40 output

Ideal for small apps, assistant bots, or prompt chaining. Fast and efficient with a 128k context window.

Gemini 2.0 Flash – $0.10 input / $0.40 output

Same cost as GPT-4.1 Nano but offers a 1M context window—useful for summarizing large docs or long conversations.

Grok-3 Mini Beta – $0.30 input / $0.50 output

One of the most cost-effective from xAI’s Grok family. With 131k context, it balances price and capacity well.

These models are perfect for startups, content tools, and automation pipelines that process lots of small queries.

High-End Premium Models

Some models come with high pricing, designed for advanced or specialized tasks. These are not meant for casual or lightweight use.

o1-pro – $150.00 input / $600.00 output

This is the most expensive model in the list. It’s built for complex enterprise-level use cases, such as advanced reasoning or specialized domain tasks. It supports up to 200k tokens in context.

gpt-4.5-preview – $75.00 input / $150.00 output

Another high-cost option from OpenAI. Though limited to 128k context, it’s likely focused on cutting-edge capabilities or evaluation access.

These models are meant for users who need maximum performance, accuracy, or experimentation with top-tier AI capabilities. For most users, they are overkill unless the project specifically demands it.

Best Value Models

Some models offer a good balance between price, performance, and context length. These are practical choices for most developers and businesses.

gpt-4.1 – $2.00 input / $8.00 output – 1M context

Offers high performance with a large context window. Suitable for long conversations, content generation, and multi-step workflows.

Claude 3.5 Haiku – $0.80 input / $4.00 output – 200k context

From Anthropic, this model provides solid reasoning at a moderate cost. Good for productivity apps and writing tools.

deepseek-chat (DeepSeek-V3) – $0.27 input / $1.10 output – 64k context

An affordable and capable model for casual and semi-technical tasks. Ideal for chatbot services and API automation.

gpt-4o-mini – $0.15 input / $0.60 output – 128k context

Cost-effective and fast. Good for real-time interactions, summaries, or customer support bots.

grok-3-mini-beta – $0.30 input / $0.50 output – 131k context

Balanced pricing and speed from xAI. Can handle moderate-length input with decent performance.

These models serve most practical use cases well without overspending. Ideal for developers who want strong output without premium costs.

Provider-wise Summary

Here’s a quick look at what each major provider offers, based on pricing, context length, and range of models.

OpenAI

  • Offers a wide range—from low-cost options like gpt-4.1-nano ($0.10 input) to premium models like o1-pro ($150 input).
  • gpt-4.1 stands out for its 1M context at a mid-range price.
  • gpt-4o-mini is a good pick for low-latency tasks.

xAI (Grok)

  • Provides flexible options with 131k context length.
  • grok-3-mini-beta is very affordable ($0.30 input / $0.50 output), while grok-3-fast-beta ($5.00 input / $25.00 output) is for heavier tasks.

Google (Gemini)

  • Leading in context size with Gemini 1.5 Pro (2M tokens) and 1.5 Flash (1M).
  • Gemini 2.0 Flash and Flash-Lite are among the lowest priced.
  • Good balance of scale and affordability, but pricing is not available for all models.

Anthropic (Claude)

  • Focuses on reasoning and thoughtful output.
  • Claude 3.5 Haiku is affordable ($0.80 input), while Claude 3 Opus is expensive but powerful.
  • All Claude models offer 200k context.

DeepSeek

  • Budget-friendly with solid performance.
  • deepseek-chat (V3) and deepseek-reasoner (R1) are good choices for developers needing fast, low-cost results with 64k context.

Cohere

  • Offers a single known model, Command A, with a 256k context window.
  • Priced at $2.50 input / $10 output, suitable for document-heavy tasks.

Alibaba Qwen

  • Underrated but very affordable.
  • Qwen-Plus-0125 is just $0.40 input / $1.20 output with 131k context.
  • Qwen2.5-Max is more expensive but has a 32k context limit.

Each provider has a different strength—whether it’s cost, speed, or long-context processing. Choosing depends on your specific use case and budget.

Things Not Included

This comparison focuses only on basic input/output token pricing for text-based API calls. Several other factors are not covered in this list:

  • Cache pricing is excluded. Some models offer reduced pricing for repeated prompts (e.g., Claude 3.7 Sonnet has a $0.30 cached prompt read).
  • Batch API discounts are not listed. For example, Claude and OpenAI offer up to 50% off in batch mode.
  • Audio, image, and multi-modal pricing is excluded. This includes models like grok-2-image and Gemini audio input.
  • Free and open-source models such as LLaMA or Falcon are not included here. These have no token cost but require server hosting.
  • Rate limits and free tiers are not considered. Some models are free under preview (e.g., Gemini 2.5 Pro Preview) but have usage caps.

If you are working with images, audio, or cached prompts, check each provider’s full documentation for detailed pricing.

Closing Thoughts

Choosing the right LLM in 2025 depends on what you need—speed, price, or context length. If your task involves long documents or code, models like GPT-4.1 or Gemini 2.0 Flash offer 1M context at reasonable rates. For cost-focused work, gpt-4.1-nano or Gemini Flash-Lite provide great value.

Premium models like Claude 3 Opus and o1-pro are powerful but expensive, best for advanced enterprise use. Meanwhile, middle-tier models like Claude 3.5 Haiku, gpt-4o-mini, and deepseek-chat balance output quality with affordability.

This comparison helps narrow down your options, but pricing and availability may change. For latest details, always check official sources like OpenAI, Google AI Studio, xAI, Anthropic, and DeepSeek.

See also  GPT-4.1 vs GPT-4o vs GPT-4.5 – Why GPT-4.1 Sets a New Standard