Token Usage Explained
Understand the different token types in API calls and how they’re billed to help you estimate and control costs effectively.
What is a Token
A token is the basic unit that AI models use to process text. You can think of it this way:
- English: roughly 1 word equals 1 token
- Chinese: roughly 1.5 to 2 characters equal 1 token
Every API call consumes tokens, including what you send (input) and what the model returns (output).
Basic Token Types
Input Tokens
All content you send to the model, including:
- System prompts
- User messages
- Conversation history
Output Tokens
All content generated by the model in response.
Here’s an example:
Input: "Introduce Beijing in one sentence" (~10 tokens)
Output: "Beijing is the capital of China, with a rich history and cultural heritage." (~15 tokens)Advanced Token Types
Beyond basic input/output, modern AI models support richer capabilities with different token types.
Cache Tokens
When you send the same or similar content multiple times, the model can reuse previously processed results. This means faster responses and lower costs.
| Token Type | Description | Typical Price Ratio |
|---|---|---|
| Cache Creation | First-time cache creation | ~125% of input price |
| Cache Read | Cache hits | ~10% of input price |
When to use:
- Long system prompts, such as role definitions or knowledge base content
- Historical messages in multi-turn conversations
- Repeated document analysis tasks
Here’s an example:
First request:
Input: [Long document 10000 tokens] + "Summarize this document"
Consumes: 10000 input + cache creation
Second request (same document):
Input: [Same document] + "Extract key data"
Consumes: 100 cache read + 200 new input
Significant cost savingsReasoning Tokens
Some advanced models (like OpenAI o1, Claude with extended thinking) “think” before generating responses, consuming additional reasoning tokens.
| Token Type | Description | Typical Price |
|---|---|---|
| Reasoning Tokens | Model’s internal thinking process | ~2-4x output price |
A few things to note:
- The reasoning process is invisible to users
- Best suited for complex problems: math, coding, logic
- Models automatically balance quality, speed, and cost
When to use:
- Complex mathematical calculations
- Code generation and debugging
- Multi-step reasoning problems
Image Tokens
When you send images to vision-capable models, images are converted to tokens for processing.
| Token Type | Description | Calculation |
|---|---|---|
| Image Input | Image recognition and analysis | Based on image dimensions |
Calculation rules:
- Larger images consume more tokens
- High detail mode consumes more tokens
- Supported formats: PNG, JPEG, WebP, GIF
Here’s an example:
Request: "What's in this image?" + [1024x1024 image]
Consumes: ~765 image input tokens + text input tokensAudio Tokens
Audio input/output (voice conversations) consume audio tokens.
| Token Type | Description | Typical Use |
|---|---|---|
| Audio Input | Speech recognition | Speech-to-text |
| Audio Output | Speech synthesis | Text-to-speech |
Token Billing Formula
Your actual cost is determined by all token types:
Total Cost = Input Cost + Output Cost + Cache Cost + Reasoning Cost + Image Cost + Audio Cost| Cost Item | Calculation |
|---|---|
| Input Cost | Text input tokens x Input price |
| Output Cost | Text output tokens x Output price |
| Cache Creation Cost | Cache creation tokens x Cache creation price (or fallback to input price) |
| Cache Read Cost | Cache read tokens x Cache read price (or fallback to input price) |
| Reasoning Cost | Reasoning tokens x Reasoning price (or fallback to output price) |
| Image Cost | Image tokens x Image price (or fallback to input price) |
| Audio Input Cost | Audio input tokens x Audio input price (or fallback to input price) |
| Audio Output Cost | Audio output tokens x Audio output price (or fallback to output price) |
Fallback mechanism: If a model doesn’t have a specific price set, the system automatically uses the base price (input or output price) to ensure billing works correctly.
How to View Token Usage
View in Dashboard
- Log in to MyTokenGate Dashboard
- Go to the Usage Statistics page
- View detailed consumption by token type
View in API Response
Each API response includes a usage field:
{
"usage": {
"prompt_tokens": 1000,
"completion_tokens": 500,
"total_tokens": 1500,
"prompt_tokens_details": {
"cached_tokens": 300,
"audio_tokens": 0,
"image_tokens": 0
},
"completion_tokens_details": {
"reasoning_tokens": 200,
"audio_tokens": 0
}
}
}Field descriptions:
prompt_tokens: Total input tokenscompletion_tokens: Total output tokenscached_tokens: Cache hit tokensreasoning_tokens: Tokens consumed during reasoningaudio_tokens: Audio-related tokensimage_tokens: Image-related tokens
Cost-Saving Tips
Use Caching Effectively
- Place long prompts at the beginning of conversations
- Reuse the same system prompt across multi-turn conversations
- Batch process similar requests
Choose Models Wisely
- Use lightweight models for simple tasks, such as GPT-4o-mini
- Reserve reasoning models for complex tasks, such as o1 or Claude with extended thinking
Control Output Length
- Set max_tokens to limit maximum output
- Use concise prompts to guide the model toward brief responses
Optimize Image Dimensions
- Use appropriately sized images, avoid oversized ones
- Use high detail only when needed, otherwise use low or auto
Frequently Asked Questions
Why is my cost higher than expected?
Possible reasons:
- Using a reasoning-capable model consumes additional tokens for the thinking process
- Sending large images
- Multi-turn conversations accumulate significant context
Does caching always save money?
Yes. While cache creation costs slightly more (~125% of input price), cache reads only cost ~10% of input price. From the second request onwards, costs drop significantly, and response times are faster too.
How do I know if a model supports a capability?
Check the Models Reference page. Each model shows supported capabilities, including vision, audio, and reasoning.
Related Links
- Models Reference - View model capabilities and prices
- API Documentation - Learn about API calls
- FAQ - More questions answered