OpenAI token calculator

GPT-5 Cost Calculator

Estimate OpenAI GPT-5 family API spend before shipping an agent, RAG flow, automation, or batch job. Pricing is read from the OpenAI JSON rule file, not hardcoded in this page.

Model rule GPT-5.5
Input / 1M $5
Output / 1M $30

OpenAI pricing rule

GPT-5.5 Token Cost Calculator

Updated 2026-05-29

Estimated API cost

$0.08
Input
$0.05
Cached input
$0.00
Output
$0.03
Per 1,000 requests
$80.00

Cost Per Million Tokens

Model Input Cached input Output Context
GPT-5.5 $5 $0.5 $30 Provider dependent
GPT-5.4 $2.5 $0.25 $15 Provider dependent
GPT-5.4 Mini $0.75 $0.075 $4.5 Provider dependent

GPT vs Claude Pricing Snapshot

GPT-5.5 $5 in / $30 out
GPT-5.4 $2.5 in / $15 out
GPT-5.4 Mini $0.75 in / $4.5 out

How GPT-5 Token Pricing Works

OpenAI API charges are estimated by multiplying billable input tokens by the input token rate and generated output tokens by the output token rate. Cached input tokens can use a lower cached-input rate when the request qualifies for prompt caching.

When GPT-5 Gets Expensive

GPT-5 cost rises fastest when an app sends large retrieved documents, keeps long conversation history, or asks for large generated outputs such as code, reports, SQL, or structured JSON.

For agents, multiply the per-request estimate by the average number of model steps. A five-step agent can cost roughly five times more than a single chat completion with the same token mix.

Good Workloads To Estimate

  • RAG answers with long retrieved context
  • AI coding tasks with large output files
  • Customer support agents with conversation memory
  • Batch summarization jobs
  • Workflow automations with repeated prompts

Pricing source: OpenAI pricing page. Last local rule update: 2026-05-29. Estimates exclude taxes, discounts, tools, and special processing fees.

FAQ

What are input tokens?

Input tokens are the text, tool context, retrieval snippets, instructions, and conversation history sent to the model before it generates a response.

What are output tokens?

Output tokens are the tokens generated by the model. They are often more expensive than input tokens, so long answers can dominate the final request cost.

Why does cached input matter?

Prompt caching can reduce repeated context costs when the same prefix is reused across requests. The calculator separates cached input so agent and RAG workloads can estimate repeat traffic more accurately.

How do I estimate GPT-5 cost for a RAG app?

Estimate the average retrieved context size, the user message size, the expected answer length, and the number of model calls per user request. Then multiply the calculator result by your request volume.

Why can per-request GPT-5 cost change so much?

The final cost changes with prompt length, output length, cache hit rate, model tier, tool usage, batching, and provider-side pricing changes.