Skip to Content
WikiFAQStream Mode

Stream Mode - MyTokenGate

1. Using Stream Mode in Python

1.1 Stream Mode with the OpenAI Library

It is recommended to use the OpenAI library for stream mode in most scenarios.

from openai import OpenAI client = OpenAI( base_url='https://gateway.mytokengate.com/v1', api_key='your-api-key' ) # Send a request with streaming output response = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "user", "content": "Tell me about the history of artificial intelligence"} ], stream=True # Enable streaming output ) # Gradually receive and process responses for chunk in response: if not chunk.choices: continue if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True)

1.2 Stream Mode with the Requests Library

If you are using the requests library for non-OpenAI scenarios, you need to ensure that both the payload and the request parameters are set to stream mode.

import requests import json url = "https://gateway.mytokengate.com/v1/chat/completions" payload = { "model": "gpt-4o", "messages": [ { "role": "user", "content": "Tell me about the history of artificial intelligence" } ], "stream": True # Set to stream mode here } headers = { "accept": "application/json", "content-type": "application/json", "authorization": "Bearer your-api-key" } response = requests.post(url, json=payload, headers=headers, stream=True) # Specify stream mode in the request # Print streaming return information if response.status_code == 200: full_content = "" for chunk in response.iter_lines(): if chunk: chunk_str = chunk.decode('utf-8').replace('data: ', '') if chunk_str != "[DONE]": chunk_data = json.loads(chunk_str) delta = chunk_data['choices'][0].get('delta', {}) content = delta.get('content', '') if content: print(content, end="", flush=True) full_content += content else: print(f"Request failed, status code: {response.status_code}")

2. Using Stream Mode in curl

By default, curl buffers the output stream. Passing the -N (or —no-buffer) option disables this buffering, allowing chunks of data to be printed to the terminal immediately.

curl -N -s \ --request POST \ --url https://gateway.mytokengate.com/v1/chat/completions \ --header 'Authorization: Bearer your-api-key' \ --header 'Content-Type: application/json' \ --data '{ "model": "gpt-4o", "messages": [ {"role":"user","content":"Tell me a story"} ], "stream": true }'
Last updated on