Interleaved Thinking - MyTokenGate
Important Notice
Some models introduce new Interleaved Thinking behaviors. On MyTokenGate, DeepSeek R1 and GLM-5.1 may emit Interleaved Thinking-related structured output when using the serverless model API—most commonly in tool-calling flows. To ensure correctness, stability, and optimal performance, please follow the guidelines below.
Overview
On MyTokenGate, Interleaved Thinking is currently supported by:
deepseek-r1- DeepSeek reasoning modelglm-5.1- GLM model
View the full model list at Model List
Interleaved Thinking is especially useful for:
- Agent-style orchestration
- Tool-calling scenarios
- Coding and debugging
- Multi-step tasks that benefit from intermediate tool outputs
1. What is Interleaved Thinking?
With Interleaved Thinking, a model can:
- Decide whether it needs to call a tool
- Call a tool
- Receive tool results
- Continue from intermediate outputs
- Decide the next step (call another tool or produce a final answer)
This enables robust multi-step execution where tool outputs can influence subsequent steps.
Diagram: Turn/Step structure with tool results
The diagram below illustrates how a single Turn can contain multiple Steps, and how the model may continue producing reasoning_content after tool results (i.e., after you send role=“tool” messages). For correct tool-calling behavior, preserve and replay reasoning_content exactly as received across the entire sequence.
2. Tool Calling: The Non-Negotiable Rule
When using tool calling with DeepSeek V3.2 or GLM-4.7, the API may return additional structured output in a dedicated field:
- reasoning_content
You must preserve reasoning_content exactly as received, and send it back unchanged in subsequent requests.
What must be preserved (including after tool results)
To be explicit: preservation is required not only for reasoning_content produced after user messages, but also for reasoning_content produced after tool results.
In tool-enabled flows, the model may produce reasoning:
- before any tool call,
- between multiple tool calls,
- and after receiving tool results (i.e., after you send role=“tool” messages and the model continues).
You must preserve and replay all such reasoning_content exactly as generated.
This includes:
- Content emitted before any tool calls
- Content emitted between tool calls (multi-step tool chaining)
- Content emitted after tool results
- Any reasoning_content segments produced across turns (keep the original order)
What you must NOT do
Do NOT:
- Modify the text
- “Clean up” or post-process it
- Merge or split segments
- Reorder segments
- Drop it while keeping only normal assistant text
If you do, you may see:
- Broken multi-step behavior around tools
- Instability across tool calls
- Reduced cache efficiency and degraded output quality
3. Client Handling Checklist (Recommended)
When receiving responses:
- Accumulate normal assistant text from content (or delta.content if streaming)
- Accumulate Interleaved Thinking text from reasoning_content (or delta.reasoning_content if streaming)
- Collect tool requests from tool_calls (or delta.tool_calls if streaming)
When sending the assistant message back to the model, include all of them:
- content
- reasoning_content (verbatim, complete, in the exact original order)
- tool_calls (as received)
Note: This rule applies to both streaming and non-streaming. Streaming only affects how you read fields (delta.*), not what you must preserve.
4. Example: Interleaved Thinking + Tool Calling (DeepSeek V3.2)
This example demonstrates DeepSeek V3.2 on MyTokenGate. The same pattern also applies to GLM-4.7.
from openai import OpenAI
import json
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://gateway.mytokengate.com/v1/"
)
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather information",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string"}
},
"required": ["city"]
}
}
}]
messages = [
{"role": "system", "content": "You are an assistant"},
{"role": "user", "content": "What's the weather like in America?"}
]
# Round 1: model reasons, then calls a tool
response = client.chat.completions.create(
model="deepseek-r1",
messages=messages,
tools=tools,
stream=True # optional; the same preservation rule applies to non-streaming
)
reasoning_content = ""
content = ""
tool_calls = []
for chunk in response:
delta = chunk.choices[0].delta
if getattr(delta, "reasoning_content", None):
reasoning_content += delta.reasoning_content
if getattr(delta, "content", None):
content += delta.content
if getattr(delta, "tool_calls", None):
tool_calls.extend(delta.tool_calls)
# Preserve reasoning_content verbatim (Round 1)
messages.append({
"role": "assistant",
"content": content,
"reasoning_content": reasoning_content,
"tool_calls": tool_calls
})
# Tool execution result (example)
messages.append({
"role": "tool",
"tool_call_id": tool_calls[0]["id"],
"content": json.dumps({"weather": "Sunny", "temp": "25C"})
})
# Round 2: model may produce NEW reasoning_content AFTER tool results
response = client.chat.completions.create(
model="deepseek-r1",
messages=messages,
tools=tools,
stream=True # optional
)
reasoning_content_2 = ""
content_2 = ""
tool_calls_2 = []
for chunk in response:
delta = chunk.choices[0].delta
if getattr(delta, "reasoning_content", None):
reasoning_content_2 += delta.reasoning_content
if getattr(delta, "content", None):
content_2 += delta.content
if getattr(delta, "tool_calls", None):
tool_calls_2.extend(delta.tool_calls)
# Preserve reasoning_content verbatim (Round 2, i.e., after tool results)
messages.append({
"role": "assistant",
"content": content_2,
"reasoning_content": reasoning_content_2,
"tool_calls": tool_calls_2
})5. Using GLM-5.1 Instead
To switch the example to GLM-5.1, change:
model="deepseek-r1"to:
model="glm-5.1"All Interleaved Thinking preservation rules remain the same, including preserving reasoning_content after tool results.
Summary
For DeepSeek R1 and GLM-5.1 on MyTokenGate:
- Interleaved Thinking enables multi-step, tool-aware execution
- With tool calling, you must preserve and replay reasoning_content verbatim—including any reasoning produced after tool results
- Never modify, drop, merge/split, or reorder reasoning_content
Following these rules ensures stable tool use and consistent multi-step behavior.