Question 1

What exactly is a token?

Accepted Answer

A token is a chunk of text that the model processes as a unit. For most English text, one token is roughly 3-4 characters or about 0.75 words. Common words like 'the', 'is', and 'and' are typically one token each. Uncommon words, proper nouns, and code identifiers are often split across multiple tokens. Whitespace, punctuation, and newlines also consume tokens. The exact boundaries depend on the tokenizer the model uses.

Question 2

Why do tokens matter for API cost?

Accepted Answer

LLM API pricing is based on token count, typically with separate rates for input tokens and output tokens. Output tokens are usually 3-5x more expensive than input tokens. A prompt that is 100 tokens longer costs more on every single API call. This compounds quickly if you are making thousands of calls per day. Measuring token count before deploying a prompt to production helps you catch unexpectedly long system prompts and estimate monthly costs accurately.

Question 3

How are tokens counted: does every model use the same method?

Accepted Answer

No. Each model family uses its own tokenizer. GPT-3.5 and GPT-4 use cl100k_base (a vocabulary of ~100,000 tokens). GPT-4o and GPT-4o-mini use o200k_base, which has a larger vocabulary (~200,000 tokens) and tends to encode the same text in fewer tokens. Claude uses its own tokenizer, which differs again. Gemini models use SentencePiece. The same text might count as 120 tokens in GPT-4 and 105 tokens in GPT-4o. Always use the tokenizer matching the model you are deploying to.

Question 4

What is a context window, and how does it relate to tokens?

Accepted Answer

The context window is the maximum number of tokens a model can process in a single request, both the input (prompt + conversation history + documents) and the output combined. GPT-4o has a 128,000 token context window; Claude 3.5 Sonnet supports up to 200,000 tokens. If your input exceeds the context window, the API returns an error. In practice, the context window is a budget: input tokens, output tokens, and any injected documents all draw from the same pool.

Model	Input (per 1M tokens)	Output (per 1M tokens)
GPT-4o	$2.50	$10.00
GPT-4o-mini	$0.15	$0.60
Claude 3.5 Sonnet	$3.00	$15.00
Claude 3 Haiku	$0.25	$1.25
Gemini 1.5 Flash	$0.075	$0.30

Component	Typical allocation
System prompt	100 to 500 tokens
Conversation history (last N turns)	2,000 to 20,000 tokens
Injected documents / RAG context	5,000 to 50,000 tokens
Reserved for output	1,000 to 4,000 tokens

Count Tokens in a ChatGPT Prompt: Token Counter

Token Count

Text Stats

Estimated Cost

Related Tools

The Sample Prompt

What Is a Token?

Why Token Count Matters

API cost

Context window limits

Response quality

How Different Tokenizers Handle the Same Text

Practical Techniques for Reducing Token Count

Trim whitespace and redundancy

Use structured formats for data

Prefer system prompts for static instructions

Cache repeated context

The Context Window as a Budget