Token Counter + LLM Cost Calculator

System Prompt Token Counter: Estimate Token Usage

Count tokens in LLM system prompts. Estimate how much of the context window your system prompt consumes across GPT-4o, Claude, and Gemini.

100% client-side. Your data never leaves your browser.

Token Count

0tokens

Text Stats

Characters772
Words119
Lines14

Estimated Cost

As input< $0.0001
As output< $0.0001
Context used0.00%

GPT-5.5 · 1.1M context

Pricing as of 2026-05

Related Tools

System Prompt Token Counter

The system prompt in this example is approximately 180 tokens. That sounds small, but system prompts in production applications routinely run 500 to 3000 tokens once you add tool schemas, persona instructions, guardrails, and few-shot examples. Counting tokens before deployment helps you budget the context window accurately across GPT-4o, Claude, and Gemini.

Understanding Context Window Budget

Every LLM API call has a context window limit. The total tokens across all messages in the request must fit within that limit:

context_window = system_prompt + conversation_history + tool_definitions + response
ModelContext Window
GPT-4o128k tokens
GPT-4o mini128k tokens
Claude 3.5 Sonnet200k tokens
Claude 3 Haiku200k tokens
Gemini 1.5 Pro1M tokens
Gemini 1.5 Flash1M tokens

For chat applications, conversation history grows with each turn. A 500 token system prompt on a 128k window is fine for turn 1, but a long conversation can fill the window regardless of system prompt size. The system prompt is fixed cost; conversation history is variable cost.

Budget Guidelines by Application Type

Interactive chat and agents

Keep system prompts under 10 to 15% of the context window. This preserves space for conversation history and multi-step reasoning. A 1000 token system prompt is not inherently a problem on a 128k model, but combine it with a 50-turn conversation, verbose tool outputs, and a long code generation response, and you approach the limit.

Single-turn classification and extraction

System prompt length matters less here because conversation history does not accumulate. You can use longer prompts with many examples if they improve accuracy.

Tool-heavy agents

Tool definitions consume tokens. The OpenAI and Anthropic APIs accept tool schemas as part of the request; these count against the context window. A complex agent with 10 tools and detailed parameter descriptions can have 2000+ tokens in tool schemas alone before the system prompt.

Token Reduction Techniques

Remove redundant instructions

If your system prompt says “Be helpful, concise, and professional” and also “Respond in a helpful and professional manner with clear, concise answers,” that is the same instruction written twice. One version is enough.

Trim verbose phrasing

Compare:

Before: “It is of the utmost importance that you always remember to greet the customer by their first name whenever their name is available to you in the conversation.”

After: “Greet the customer by name when available.”

The shorter version conveys identical instructions at roughly a quarter of the token count.

Move examples out of the system prompt

Few-shot examples are effective but expensive. A system prompt with five examples at 200 tokens each adds 1000 tokens of fixed cost to every request. Alternatives:

Use concise tool schemas

Tool descriptions should describe what the tool does and what its parameters mean, not write a paragraph about each one. Compare:

{
  "description": "This tool allows you to look up information about a specific order by its unique identifier. You should use this tool whenever a customer asks about the status of their order, shipping information, or the items they purchased.",
  "name": "order_lookup"
}

versus:

{
  "description": "Returns order status, shipping info, and line items for a given order_id.",
  "name": "order_lookup"
}

The second version is more useful to the model and uses fewer tokens.

Prompt Caching

If your system prompt is stable across requests, prompt caching eliminates most of the repeated input token cost.

Anthropic

Mark the end of your system prompt with a cache control breakpoint. Subsequent requests with the same prefix are served from cache at 10% of the normal input price for cache reads (cache write is 25% more expensive than normal input, amortized over subsequent reads).

system_prompt = [
  {
    "type": "text",
    "text": "You are a helpful customer support agent...",
    "cache_control": {"type": "ephemeral"}
  }
]

Cache entries last 5 minutes with automatic refresh on use.

OpenAI

OpenAI applies prompt caching automatically for prompts over 1024 tokens. Cached token reads are billed at 50% of normal input price. You do not need to opt in, but you do need a stable prefix. Varying the system prompt between requests defeats the cache.

When caching matters

At $3 per million input tokens (GPT-4o pricing), a 1000 token system prompt costs $0.003 per request. At 1 million requests per day, that is $3000 per day, or $90,000 per month. With 50% caching, that drops to $1500 per day. The savings scale directly with request volume and system prompt length.