Skip to Content
WikiAdvanced FeaturesInterleaved Thinking

Interleaved Thinking - MyTokenGate

Important Notice

Some models introduce new Interleaved Thinking behaviors. On MyTokenGate, DeepSeek R1 and GLM-5.1 may emit Interleaved Thinking-related structured output when using the serverless model API—most commonly in tool-calling flows. To ensure correctness, stability, and optimal performance, please follow the guidelines below.


Overview

On MyTokenGate, Interleaved Thinking is currently supported by:

  • deepseek-r1 - DeepSeek reasoning model
  • glm-5.1 - GLM model

View the full model list at Model List 

Interleaved Thinking is especially useful for:

  • Agent-style orchestration
  • Tool-calling scenarios
  • Coding and debugging
  • Multi-step tasks that benefit from intermediate tool outputs

1. What is Interleaved Thinking?

With Interleaved Thinking, a model can:

  1. Decide whether it needs to call a tool
  2. Call a tool
  3. Receive tool results
  4. Continue from intermediate outputs
  5. Decide the next step (call another tool or produce a final answer)

This enables robust multi-step execution where tool outputs can influence subsequent steps.

Diagram: Turn/Step structure with tool results

The diagram below illustrates how a single Turn can contain multiple Steps, and how the model may continue producing reasoning_content after tool results (i.e., after you send role=“tool” messages). For correct tool-calling behavior, preserve and replay reasoning_content exactly as received across the entire sequence.


2. Tool Calling: The Non-Negotiable Rule

When using tool calling with DeepSeek V3.2 or GLM-4.7, the API may return additional structured output in a dedicated field:

  • reasoning_content

You must preserve reasoning_content exactly as received, and send it back unchanged in subsequent requests.

What must be preserved (including after tool results)

To be explicit: preservation is required not only for reasoning_content produced after user messages, but also for reasoning_content produced after tool results.

In tool-enabled flows, the model may produce reasoning:

  • before any tool call,
  • between multiple tool calls,
  • and after receiving tool results (i.e., after you send role=“tool” messages and the model continues).

You must preserve and replay all such reasoning_content exactly as generated.

This includes:

  • Content emitted before any tool calls
  • Content emitted between tool calls (multi-step tool chaining)
  • Content emitted after tool results
  • Any reasoning_content segments produced across turns (keep the original order)

What you must NOT do

Do NOT:

  • Modify the text
  • “Clean up” or post-process it
  • Merge or split segments
  • Reorder segments
  • Drop it while keeping only normal assistant text

If you do, you may see:

  • Broken multi-step behavior around tools
  • Instability across tool calls
  • Reduced cache efficiency and degraded output quality

When receiving responses:

  • Accumulate normal assistant text from content (or delta.content if streaming)
  • Accumulate Interleaved Thinking text from reasoning_content (or delta.reasoning_content if streaming)
  • Collect tool requests from tool_calls (or delta.tool_calls if streaming)

When sending the assistant message back to the model, include all of them:

  • content
  • reasoning_content (verbatim, complete, in the exact original order)
  • tool_calls (as received)

Note: This rule applies to both streaming and non-streaming. Streaming only affects how you read fields (delta.*), not what you must preserve.


4. Example: Interleaved Thinking + Tool Calling (DeepSeek V3.2)

This example demonstrates DeepSeek V3.2 on MyTokenGate. The same pattern also applies to GLM-4.7.

from openai import OpenAI import json client = OpenAI( api_key="YOUR_API_KEY", base_url="https://gateway.mytokengate.com/v1/" ) tools = [{ "type": "function", "function": { "name": "get_weather", "description": "Get weather information", "parameters": { "type": "object", "properties": { "city": {"type": "string"} }, "required": ["city"] } } }] messages = [ {"role": "system", "content": "You are an assistant"}, {"role": "user", "content": "What's the weather like in America?"} ] # Round 1: model reasons, then calls a tool response = client.chat.completions.create( model="deepseek-r1", messages=messages, tools=tools, stream=True # optional; the same preservation rule applies to non-streaming ) reasoning_content = "" content = "" tool_calls = [] for chunk in response: delta = chunk.choices[0].delta if getattr(delta, "reasoning_content", None): reasoning_content += delta.reasoning_content if getattr(delta, "content", None): content += delta.content if getattr(delta, "tool_calls", None): tool_calls.extend(delta.tool_calls) # Preserve reasoning_content verbatim (Round 1) messages.append({ "role": "assistant", "content": content, "reasoning_content": reasoning_content, "tool_calls": tool_calls }) # Tool execution result (example) messages.append({ "role": "tool", "tool_call_id": tool_calls[0]["id"], "content": json.dumps({"weather": "Sunny", "temp": "25C"}) }) # Round 2: model may produce NEW reasoning_content AFTER tool results response = client.chat.completions.create( model="deepseek-r1", messages=messages, tools=tools, stream=True # optional ) reasoning_content_2 = "" content_2 = "" tool_calls_2 = [] for chunk in response: delta = chunk.choices[0].delta if getattr(delta, "reasoning_content", None): reasoning_content_2 += delta.reasoning_content if getattr(delta, "content", None): content_2 += delta.content if getattr(delta, "tool_calls", None): tool_calls_2.extend(delta.tool_calls) # Preserve reasoning_content verbatim (Round 2, i.e., after tool results) messages.append({ "role": "assistant", "content": content_2, "reasoning_content": reasoning_content_2, "tool_calls": tool_calls_2 })

5. Using GLM-5.1 Instead

To switch the example to GLM-5.1, change:

model="deepseek-r1"

to:

model="glm-5.1"

All Interleaved Thinking preservation rules remain the same, including preserving reasoning_content after tool results.


Summary

For DeepSeek R1 and GLM-5.1 on MyTokenGate:

  1. Interleaved Thinking enables multi-step, tool-aware execution
  2. With tool calling, you must preserve and replay reasoning_content verbatim—including any reasoning produced after tool results
  3. Never modify, drop, merge/split, or reorder reasoning_content

Following these rules ensures stable tool use and consistent multi-step behavior.

Last updated on