Skip to Content
WikiBillingToken Usage Explained

Token Usage Explained

Understand the different token types in API calls and how they’re billed to help you estimate and control costs effectively.

What is a Token

A token is the basic unit that AI models use to process text. You can think of it this way:

  • English: roughly 1 word equals 1 token
  • Chinese: roughly 1.5 to 2 characters equal 1 token

Every API call consumes tokens, including what you send (input) and what the model returns (output).


Basic Token Types

Input Tokens

All content you send to the model, including:

  • System prompts
  • User messages
  • Conversation history

Output Tokens

All content generated by the model in response.

Here’s an example:

Input: "Introduce Beijing in one sentence" (~10 tokens) Output: "Beijing is the capital of China, with a rich history and cultural heritage." (~15 tokens)

Advanced Token Types

Beyond basic input/output, modern AI models support richer capabilities with different token types.

Cache Tokens

When you send the same or similar content multiple times, the model can reuse previously processed results. This means faster responses and lower costs.

Token TypeDescriptionTypical Price Ratio
Cache CreationFirst-time cache creation~125% of input price
Cache ReadCache hits~10% of input price

When to use:

  • Long system prompts, such as role definitions or knowledge base content
  • Historical messages in multi-turn conversations
  • Repeated document analysis tasks

Here’s an example:

First request: Input: [Long document 10000 tokens] + "Summarize this document" Consumes: 10000 input + cache creation Second request (same document): Input: [Same document] + "Extract key data" Consumes: 100 cache read + 200 new input Significant cost savings

Reasoning Tokens

Some advanced models (like OpenAI o1, Claude with extended thinking) “think” before generating responses, consuming additional reasoning tokens.

Token TypeDescriptionTypical Price
Reasoning TokensModel’s internal thinking process~2-4x output price

A few things to note:

  • The reasoning process is invisible to users
  • Best suited for complex problems: math, coding, logic
  • Models automatically balance quality, speed, and cost

When to use:

  • Complex mathematical calculations
  • Code generation and debugging
  • Multi-step reasoning problems

Image Tokens

When you send images to vision-capable models, images are converted to tokens for processing.

Token TypeDescriptionCalculation
Image InputImage recognition and analysisBased on image dimensions

Calculation rules:

  • Larger images consume more tokens
  • High detail mode consumes more tokens
  • Supported formats: PNG, JPEG, WebP, GIF

Here’s an example:

Request: "What's in this image?" + [1024x1024 image] Consumes: ~765 image input tokens + text input tokens

Audio Tokens

Audio input/output (voice conversations) consume audio tokens.

Token TypeDescriptionTypical Use
Audio InputSpeech recognitionSpeech-to-text
Audio OutputSpeech synthesisText-to-speech

Token Billing Formula

Your actual cost is determined by all token types:

Total Cost = Input Cost + Output Cost + Cache Cost + Reasoning Cost + Image Cost + Audio Cost
Cost ItemCalculation
Input CostText input tokens x Input price
Output CostText output tokens x Output price
Cache Creation CostCache creation tokens x Cache creation price (or fallback to input price)
Cache Read CostCache read tokens x Cache read price (or fallback to input price)
Reasoning CostReasoning tokens x Reasoning price (or fallback to output price)
Image CostImage tokens x Image price (or fallback to input price)
Audio Input CostAudio input tokens x Audio input price (or fallback to input price)
Audio Output CostAudio output tokens x Audio output price (or fallback to output price)

Fallback mechanism: If a model doesn’t have a specific price set, the system automatically uses the base price (input or output price) to ensure billing works correctly.


How to View Token Usage

View in Dashboard

  1. Log in to MyTokenGate Dashboard
  2. Go to the Usage Statistics page
  3. View detailed consumption by token type

View in API Response

Each API response includes a usage field:

{ "usage": { "prompt_tokens": 1000, "completion_tokens": 500, "total_tokens": 1500, "prompt_tokens_details": { "cached_tokens": 300, "audio_tokens": 0, "image_tokens": 0 }, "completion_tokens_details": { "reasoning_tokens": 200, "audio_tokens": 0 } } }

Field descriptions:

  • prompt_tokens: Total input tokens
  • completion_tokens: Total output tokens
  • cached_tokens: Cache hit tokens
  • reasoning_tokens: Tokens consumed during reasoning
  • audio_tokens: Audio-related tokens
  • image_tokens: Image-related tokens

Cost-Saving Tips

Use Caching Effectively

  • Place long prompts at the beginning of conversations
  • Reuse the same system prompt across multi-turn conversations
  • Batch process similar requests

Choose Models Wisely

  • Use lightweight models for simple tasks, such as GPT-4o-mini
  • Reserve reasoning models for complex tasks, such as o1 or Claude with extended thinking

Control Output Length

  • Set max_tokens to limit maximum output
  • Use concise prompts to guide the model toward brief responses

Optimize Image Dimensions

  • Use appropriately sized images, avoid oversized ones
  • Use high detail only when needed, otherwise use low or auto

Frequently Asked Questions

Why is my cost higher than expected?

Possible reasons:

  1. Using a reasoning-capable model consumes additional tokens for the thinking process
  2. Sending large images
  3. Multi-turn conversations accumulate significant context

Does caching always save money?

Yes. While cache creation costs slightly more (~125% of input price), cache reads only cost ~10% of input price. From the second request onwards, costs drop significantly, and response times are faster too.

How do I know if a model supports a capability?

Check the Models Reference page. Each model shows supported capabilities, including vision, audio, and reasoning.


Last updated on