对话补全

为给定的对话创建模型响应。

端点


POST https://gateway.mytokengate.com/v1/chat/completions

请求示例


curl --request POST \
  --url https://gateway.mytokengate.com/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": "中国大模型行业2025年将会迎来哪些机遇和挑战？"
      }
    ]
  }'

认证

所有请求需要在 Authorization 头中包含 Bearer token：


Authorization: Bearer YOUR_API_KEY

请求参数

参数	类型	必填	说明
`model`	string	是	模型名称，详见模型列表
`messages`	array	是	对话消息列表，详见消息格式
`max_completion_tokens`	integer	否	最大生成 token 数（含推理 token），建议优先使用此参数
`max_tokens`	integer	否	已废弃，建议使用 `max_completion_tokens`
`temperature`	float	否	控制随机性，范围 0-2，默认值因模型而异
`top_p`	float	否	核采样参数，范围 0-1
`stream`	boolean	否	是否流式输出，true 时以 Server-Sent Events 返回，以 `data: [DONE]` 结束
`stream_options`	object	否	`{ "include_usage": true }` 在流式最后一个 chunk 中返回 usage 数据
`stop`	string 或 array	否	最多 4 个停止序列，生成遇到这些序列时会停止
`n`	integer	否	返回的完成数量，默认 1
`response_format`	object	否	指定输出格式，详见 JSON 模式
`tools`	array	否	工具列表，详见 Function Calling
`tool_choice`	string 或 object	否	控制工具使用策略，详见 Function Calling
`reasoning_effort`	string	否	推理深度，仅适用于 o 系列推理模型：`low`、`medium`、`high`
`presence_penalty`	float	否	存在惩罚，范围 -2.0 到 2.0
`frequency_penalty`	float	否	频率惩罚，范围 -2.0 到 2.0
`seed`	integer	否	采样种子，用于确定性输出
`user`	string	否	终端用户标识，用于滥用监控

`reasoning_effort` 参数

控制模型在回答前推理的深度，仅支持 o 系列推理模型（o3、o4-mini 等）：

值	说明
`low`	低推理深度，快速响应
`medium`	中等推理深度
`high`	高推理深度，更准确

消息格式

系统消息（System Message）

向模型提供指令：


{ "role": "system", "content": "You are a helpful assistant." }

对于 o 系列模型（o3、o4-mini），建议使用 developer 角色：


{ "role": "developer", "content": "You are a coding expert." }

用户消息（纯文本）


{ "role": "user", "content": "你好！" }

用户消息（含图片）


{
  "role": "user",
  "content": [
    { "type": "text", "text": "这张图片里有什么？" },
    {
      "type": "image_url",
      "image_url": {
        "url": "https://example.com/photo.jpg",
        "detail": "auto"
      }
    }
  ]
}

detail 参数控制图片分辨率：

值	说明
`auto`	模型自动决定（默认）
`low`	低分辨率，消耗更少 token
`high`	高分辨率，消耗更多 token

助手消息（含工具调用）


{
  "role": "assistant",
  "content": null,
  "tool_calls": [
    {
      "id": "call_abc123",
      "type": "function",
      "function": {
        "name": "get_weather",
        "arguments": "{\"location\": \"Paris\"}"
      }
    }
  ]
}

工具结果消息


{
  "role": "tool",
  "tool_call_id": "call_abc123",
  "content": "巴黎今天22°C，晴天"
}

响应


{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1677858242,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "响应内容...",
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "function_name",
              "arguments": "{\"arg\": \"value\"}"
            }
          }
        ]
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 123,
    "completion_tokens": 456,
    "total_tokens": 579,
    "prompt_tokens_details": {
      "cached_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0
    }
  }
}

`finish_reason` 枚举

值	说明
`stop`	自然结束或命中停止序列
`length`	达到 `max_completion_tokens` 上限
`tool_calls`	模型正在调用工具
`content_filter`	内容被过滤

`usage` 字段

字段	说明
`prompt_tokens`	输入 token 数
`completion_tokens`	输出 token 数
`total_tokens`	总 token 数
`prompt_tokens_details.cached_tokens`	缓存命中的 token 数
`completion_tokens_details.reasoning_tokens`	推理消耗的 token 数（o 系列）

流式输出

设置 stream: true 以 Server-Sent Events (SSE) 格式接收响应：


from openai import OpenAI
 
client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://gateway.mytokengate.com/v1"
)
 
stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "你好！"}],
    stream=True,
    stream_options={"include_usage": True}
)
 
for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.content:
        print(delta.content, end="")

流式 Chunk 格式

每个 chunk 的 JSON 结构如下：


{
  "id": "chatcmpl-abc",
  "object": "chat.completion.chunk",
  "created": 1234,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": "assistant",
        "content": "token text"
      },
      "finish_reason": null
    }
  ]
}

流式输出的终止标记：


data: [DONE]

流式推理内容

对于返回推理内容的模型（DeepSeek R1、GLM-5.1 等），delta 对象可能包含 reasoning_content：


for chunk in stream:
    delta = chunk.choices[0].delta
    if getattr(delta, "reasoning_content", None):
        # 推理/思考内容
        print(f"[思考] {delta.reasoning_content}")
    if delta.content:
        # 最终响应文本
        print(delta.content, end="")

关于多轮对话中保留 reasoning_content 的重要规则，详见交错思考。

函数调用（Function Calling）

在请求中添加 tools 参数即可启用函数调用：


tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "获取指定城市的当前天气",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "城市名称"
                    }
                },
                "required": ["city"]
            }
        }
    }
]
 
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "巴黎天气怎么样？"}],
    tools=tools
)
 
# 提取工具调用
tool_call = response.choices[0].message.tool_calls[0]
function_name = tool_call.function.name
function_args = json.loads(tool_call.function.arguments)
 
# 执行函数并返回结果
messages.append(response.choices[0].message)
messages.append({
    "role": "tool",
    "tool_call_id": tool_call.id,
    "content": get_weather(**function_args)
})
 
# 继续对话
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools
)

`tool_choice` 选项

值	说明
`"auto"`	模型自行决定是否使用工具（默认）
`"none"`	不使用任何工具
`"required"`	必须调用至少一个工具
`{"type": "function", "function": {"name": "xxx"}}`	必须调用指定工具

完整示例见 Function Calling。

错误码

状态码	说明
400	请求参数无效
401	API Key 无效或缺失
404	模型不存在
429	触发限流
503/504	服务暂时不可用