OpenAI token calculator
GPT-5 Cost Calculator
Estimate OpenAI GPT-5 family API spend before shipping an agent, RAG flow, automation, or batch job. Pricing is read from the OpenAI JSON rule file, not hardcoded in this page.
OpenAI pricing rule
GPT-5.5 Token Cost Calculator
Estimated API cost
$0.08- Input
- $0.05
- Cached input
- $0.00
- Output
- $0.03
- Per 1,000 requests
- $80.00
Cost Per Million Tokens
| Model | Input | Cached input | Output | Context |
|---|---|---|---|---|
| GPT-5.5 | $5 | $0.5 | $30 | Provider dependent |
| GPT-5.4 | $2.5 | $0.25 | $15 | Provider dependent |
| GPT-5.4 Mini | $0.75 | $0.075 | $4.5 | Provider dependent |
GPT vs Claude Pricing Snapshot
How GPT-5 Token Pricing Works
OpenAI API charges are estimated by multiplying billable input tokens by the input token rate and generated output tokens by the output token rate. Cached input tokens can use a lower cached-input rate when the request qualifies for prompt caching.
When GPT-5 Gets Expensive
GPT-5 cost rises fastest when an app sends large retrieved documents, keeps long conversation history, or asks for large generated outputs such as code, reports, SQL, or structured JSON.
For agents, multiply the per-request estimate by the average number of model steps. A five-step agent can cost roughly five times more than a single chat completion with the same token mix.
Good Workloads To Estimate
- RAG answers with long retrieved context
- AI coding tasks with large output files
- Customer support agents with conversation memory
- Batch summarization jobs
- Workflow automations with repeated prompts
Pricing source: OpenAI pricing page. Last local rule update: 2026-05-29. Estimates exclude taxes, discounts, tools, and special processing fees.
FAQ
What are input tokens?
Input tokens are the text, tool context, retrieval snippets, instructions, and conversation history sent to the model before it generates a response.
What are output tokens?
Output tokens are the tokens generated by the model. They are often more expensive than input tokens, so long answers can dominate the final request cost.
Why does cached input matter?
Prompt caching can reduce repeated context costs when the same prefix is reused across requests. The calculator separates cached input so agent and RAG workloads can estimate repeat traffic more accurately.
How do I estimate GPT-5 cost for a RAG app?
Estimate the average retrieved context size, the user message size, the expected answer length, and the number of model calls per user request. Then multiply the calculator result by your request volume.
Why can per-request GPT-5 cost change so much?
The final cost changes with prompt length, output length, cache hit rate, model tier, tool usage, batching, and provider-side pricing changes.