Question 1

What are input tokens?

Accepted Answer

Input tokens are the text, tool context, retrieval snippets, instructions, and conversation history sent to the model before it generates a response.

Question 2

What are output tokens?

Accepted Answer

Output tokens are the tokens generated by the model. They are often more expensive than input tokens, so long answers can dominate the final request cost.

Question 3

Why does cached input matter?

Accepted Answer

Prompt caching can reduce repeated context costs when the same prefix is reused across requests. The calculator separates cached input so agent and RAG workloads can estimate repeat traffic more accurately.

Question 4

How do I estimate GPT-5 cost for a RAG app?

Accepted Answer

Estimate the average retrieved context size, the user message size, the expected answer length, and the number of model calls per user request. Then multiply the calculator result by your request volume.

Question 5

Why can per-request GPT-5 cost change so much?

Accepted Answer

The final cost changes with prompt length, output length, cache hit rate, model tier, tool usage, batching, and provider-side pricing changes.

Model	Input	Cached input	Output	Context
GPT-5.5	$5	$0.5	$30	Provider dependent
GPT-5.4	$2.5	$0.25	$15	Provider dependent
GPT-5.4 Mini	$0.75	$0.075	$4.5	Provider dependent

GPT-5 Cost Calculator

GPT-5.5 Token Cost Calculator

Cost Per Million Tokens

GPT vs Claude Pricing Snapshot

How GPT-5 Token Pricing Works

When GPT-5 Gets Expensive

Good Workloads To Estimate

FAQ