Stream Mode - MyTokenGate

1. Using Stream Mode in Python

1.1 Stream Mode with the OpenAI Library

It is recommended to use the OpenAI library for stream mode in most scenarios.


from openai import OpenAI
 
client = OpenAI(
    base_url='https://gateway.mytokengate.com/v1',
    api_key='your-api-key'
)
 
# Send a request with streaming output
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "Tell me about the history of artificial intelligence"}
    ],
    stream=True  # Enable streaming output
)
 
# Gradually receive and process responses
for chunk in response:
    if not chunk.choices:
        continue
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

1.2 Stream Mode with the Requests Library

If you are using the requests library for non-OpenAI scenarios, you need to ensure that both the payload and the request parameters are set to stream mode.


import requests
import json
 
url = "https://gateway.mytokengate.com/v1/chat/completions"
 
payload = {
    "model": "gpt-4o",
    "messages": [
        {
            "role": "user",
            "content": "Tell me about the history of artificial intelligence"
        }
    ],
    "stream": True  # Set to stream mode here
}
 
headers = {
    "accept": "application/json",
    "content-type": "application/json",
    "authorization": "Bearer your-api-key"
}
 
response = requests.post(url, json=payload, headers=headers, stream=True)  # Specify stream mode in the request
 
# Print streaming return information
if response.status_code == 200:
    full_content = ""
    for chunk in response.iter_lines():
        if chunk:
            chunk_str = chunk.decode('utf-8').replace('data: ', '')
            if chunk_str != "[DONE]":
                chunk_data = json.loads(chunk_str)
                delta = chunk_data['choices'][0].get('delta', {})
                content = delta.get('content', '')
                if content:
                    print(content, end="", flush=True)
                    full_content += content
else:
    print(f"Request failed, status code: {response.status_code}")

2. Using Stream Mode in curl

By default, curl buffers the output stream. Passing the -N (or —no-buffer) option disables this buffering, allowing chunks of data to be printed to the terminal immediately.


curl -N -s \
  --request POST \
  --url https://gateway.mytokengate.com/v1/chat/completions \
  --header 'Authorization: Bearer your-api-key' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "gpt-4o",
    "messages": [
        {"role":"user","content":"Tell me a story"}
    ],
    "stream": true
  }'