Interleaved Thinking - MyTokenGate

Important Notice

Some models introduce new Interleaved Thinking behaviors. On MyTokenGate, DeepSeek R1 and GLM-5.1 may emit Interleaved Thinking-related structured output when using the serverless model API—most commonly in tool-calling flows. To ensure correctness, stability, and optimal performance, please follow the guidelines below.

Overview

On MyTokenGate, Interleaved Thinking is currently supported by:

deepseek-r1 - DeepSeek reasoning model
glm-5.1 - GLM model

View the full model list at Model List

Interleaved Thinking is especially useful for:

Agent-style orchestration
Tool-calling scenarios
Coding and debugging
Multi-step tasks that benefit from intermediate tool outputs

1. What is Interleaved Thinking?

With Interleaved Thinking, a model can:

Decide whether it needs to call a tool
Call a tool
Receive tool results
Continue from intermediate outputs
Decide the next step (call another tool or produce a final answer)

This enables robust multi-step execution where tool outputs can influence subsequent steps.

Diagram: Turn/Step structure with tool results

The diagram below illustrates how a single Turn can contain multiple Steps, and how the model may continue producing reasoning_content after tool results (i.e., after you send role=“tool” messages). For correct tool-calling behavior, preserve and replay reasoning_content exactly as received across the entire sequence.

2. Tool Calling: The Non-Negotiable Rule

When using tool calling with DeepSeek V3.2 or GLM-4.7, the API may return additional structured output in a dedicated field:

reasoning_content

You must preserve reasoning_content exactly as received, and send it back unchanged in subsequent requests.

What must be preserved (including after tool results)

To be explicit: preservation is required not only for reasoning_content produced after user messages, but also for reasoning_content produced after tool results.

In tool-enabled flows, the model may produce reasoning:

before any tool call,
between multiple tool calls,
and after receiving tool results (i.e., after you send role=“tool” messages and the model continues).

You must preserve and replay all such reasoning_content exactly as generated.

This includes:

Content emitted before any tool calls
Content emitted between tool calls (multi-step tool chaining)
Content emitted after tool results
Any reasoning_content segments produced across turns (keep the original order)

What you must NOT do

Do NOT:

Modify the text
“Clean up” or post-process it
Merge or split segments
Reorder segments
Drop it while keeping only normal assistant text

If you do, you may see:

Broken multi-step behavior around tools
Instability across tool calls
Reduced cache efficiency and degraded output quality

3. Client Handling Checklist (Recommended)

When receiving responses:

Accumulate normal assistant text from content (or delta.content if streaming)
Accumulate Interleaved Thinking text from reasoning_content (or delta.reasoning_content if streaming)
Collect tool requests from tool_calls (or delta.tool_calls if streaming)

When sending the assistant message back to the model, include all of them:

content
reasoning_content (verbatim, complete, in the exact original order)
tool_calls (as received)

Note: This rule applies to both streaming and non-streaming. Streaming only affects how you read fields (delta.*), not what you must preserve.

4. Example: Interleaved Thinking + Tool Calling (DeepSeek V3.2)

This example demonstrates DeepSeek V3.2 on MyTokenGate. The same pattern also applies to GLM-4.7.


from openai import OpenAI
import json
 
client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://gateway.mytokengate.com/v1/"
)
 
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get weather information",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {"type": "string"}
            },
            "required": ["city"]
        }
    }
}]
 
messages = [
    {"role": "system", "content": "You are an assistant"},
    {"role": "user", "content": "What's the weather like in America?"}
]
 
# Round 1: model reasons, then calls a tool
response = client.chat.completions.create(
    model="deepseek-r1",
    messages=messages,
    tools=tools,
    stream=True  # optional; the same preservation rule applies to non-streaming
)
 
reasoning_content = ""
content = ""
tool_calls = []
 
for chunk in response:
    delta = chunk.choices[0].delta
    if getattr(delta, "reasoning_content", None):
        reasoning_content += delta.reasoning_content
    if getattr(delta, "content", None):
        content += delta.content
    if getattr(delta, "tool_calls", None):
        tool_calls.extend(delta.tool_calls)
 
# Preserve reasoning_content verbatim (Round 1)
messages.append({
    "role": "assistant",
    "content": content,
    "reasoning_content": reasoning_content,
    "tool_calls": tool_calls
})
 
# Tool execution result (example)
messages.append({
    "role": "tool",
    "tool_call_id": tool_calls[0]["id"],
    "content": json.dumps({"weather": "Sunny", "temp": "25C"})
})
 
# Round 2: model may produce NEW reasoning_content AFTER tool results
response = client.chat.completions.create(
    model="deepseek-r1",
    messages=messages,
    tools=tools,
    stream=True  # optional
)
 
reasoning_content_2 = ""
content_2 = ""
tool_calls_2 = []
 
for chunk in response:
    delta = chunk.choices[0].delta
    if getattr(delta, "reasoning_content", None):
        reasoning_content_2 += delta.reasoning_content
    if getattr(delta, "content", None):
        content_2 += delta.content
    if getattr(delta, "tool_calls", None):
        tool_calls_2.extend(delta.tool_calls)
 
# Preserve reasoning_content verbatim (Round 2, i.e., after tool results)
messages.append({
    "role": "assistant",
    "content": content_2,
    "reasoning_content": reasoning_content_2,
    "tool_calls": tool_calls_2
})

5. Using GLM-5.1 Instead

To switch the example to GLM-5.1, change:


model="deepseek-r1"

to:


model="glm-5.1"

All Interleaved Thinking preservation rules remain the same, including preserving reasoning_content after tool results.

Summary

For DeepSeek R1 and GLM-5.1 on MyTokenGate:

Interleaved Thinking enables multi-step, tool-aware execution
With tool calling, you must preserve and replay reasoning_content verbatim—including any reasoning produced after tool results
Never modify, drop, merge/split, or reorder reasoning_content

Following these rules ensures stable tool use and consistent multi-step behavior.