Text Generation - MyTokenGate

Language Model (LLM) User Manual

1. Model Core Capabilities

1.1 Basic Functions

Text Generation: Generate coherent natural language text based on context, supporting various styles and genres.

Semantic Understanding: Deeply parse user intent, supporting multi-round dialogue management to ensure the coherence and accuracy of conversations.

Knowledge Q&A: Cover a wide range of knowledge domains, including science, technology, culture, history, etc., providing accurate knowledge answers.

Code Assistance: Support code generation, explanation, and debugging for multiple mainstream programming languages (such as Python, Java, C++, etc.).

1.2 Advanced Capabilities

Long Text Processing: Support context windows of 4k to 200k tokens, suitable for long document generation and complex dialogue scenarios.

Instruction Following: Precisely understand complex task instructions, such as “compare A/B schemes using a Markdown table.”

Style Control: Adjust output style through system prompts, supporting various styles such as academic, conversational, and poetry.

Multimodal Support: In addition to text generation, support tasks such as image description and speech-to-text.

2. API Call Specifications

2.1 Basic Request Structure

You can make end-to-end API requests using the OpenAI SDK.

Generate Dialogue


from openai import OpenAI
 
client = OpenAI(api_key="YOUR_KEY", base_url="https://gateway.mytokengate.com/v1")
 
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Write a haiku about recursion in programming."}
    ],
    temperature=0.7,
    max_tokens=1024,
    stream=True
)
 
for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Analyze an Image


from openai import OpenAI
 
client = OpenAI(api_key="YOUR_KEY", base_url="https://gateway.mytokengate.com/v1")
 
response = client.chat.completions.create(
    model="gemini-2.5-flash-image",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://example.com/image.png"
                    }
                },
                {
                    "type": "text",
                    "text": "What's in this image?"
                }
            ]
        }
    ],
    temperature=0.7,
    max_tokens=1024,
    stream=True
)
 
for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Generate JSON Data


import json
from openai import OpenAI
 
client = OpenAI(
    api_key="YOUR_KEY",
    base_url="https://gateway.mytokengate.com/v1"
)
 
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant designed to output JSON."},
        {"role": "user", "content": "Who were the men's and women's singles champions of the 2020 Olympic table tennis? Respond in JSON format."}
    ],
    response_format={"type": "json_object"}
)
 
print(response.choices[0].message.content)

2.2 Message Structure Explanation

Message Type	Function Description	Example Content
system	Model instructions, defining the AI’s role and general behavior	e.g., “You are a pediatrician with 10 years of experience.”
user	User input, passing the end user’s message to the model	e.g., “How should a persistent fever in a toddler be treated?“
assistant	Model-generated historical responses, providing examples of how it should respond to the current request	e.g., “I suggest measuring the temperature first…“

3. Model Selection Guide

Claude Series

claude-opus-4-6 - Strongest reasoning capability
claude-sonnet-4-6 - Balanced performance and cost
claude-haiku-4-5-20251001 - Fast response

GPT Series

gpt-4o - Multimodal capability
gpt-4.1 - Enhanced version
gpt-5 series - Latest models

Gemini Series

gemini-2.5-pro - Complex tasks
gemini-2.5-flash - Fast response
gemini-3.1-pro-preview - Latest preview

4. Detailed Explanation of Core Parameters

4.1 Creativity Control


# Temperature parameter (0.0~2.0)
temperature=0.5  # Balances creativity and reliability
 
# Nucleus sampling (top_p)
top_p=0.9  # Considers only the top 90% probability cumulative word set

4.2 Output Limits


max_tokens=1000  # Maximum generation length per request
stop=["\n##", "<|end|>"]  # Stop sequences
frequency_penalty=0.5  # Suppresses repetitive word usage (-2.0~2.0)
stream=True  # Stream output

4.3 Common Issues with Language Model Scenarios

Model Output Garbled

Try setting parameters like temperature, top_k, top_p, and frequency_penalty.


payload = {
    "model": "gpt-4o",
    "messages": [
        {"role": "user", "content": "1+1=?"}
    ],
    "max_tokens": 200,
    "temperature": 0.7,
    "top_k": 50,
    "top_p": 0.7,
    "frequency_penalty": 0
}

Explanation of max_tokens

It is recommended to reserve around 10k as space for input content.

Output Truncation Issues

Set max_token to an appropriate value
Use stream requests for long outputs
Increase client timeout

Error Code Handling

Error Code	Common Cause	Solution
400	Parameter format error	Check parameter ranges
401	API Key not correctly set	Verify the API Key
403	Insufficient permissions	Refer to error messages
429	Request rate limit exceeded	Implement exponential backoff retry
503/504	Model overload	Switch to backup model nodes

5. Billing and Quota Management

5.1 Billing Formula

Total Cost = (Input Tokens × Input Unit Price) + (Output Tokens × Output Unit Price)

6. Application Scenarios

6.1 Technical Documentation Generation


from openai import OpenAI
 
client = OpenAI(api_key="YOUR_KEY", base_url="https://gateway.mytokengate.com/v1")
 
response = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{
        "role": "user",
        "content": "Write a Python tutorial on asynchronous web scraping, including code examples and precautions."
    }],
    temperature=0.7,
    max_tokens=4096
)

6.2 Data Analysis Reports


from openai import OpenAI
 
client = OpenAI(api_key="YOUR_KEY", base_url="https://gateway.mytokengate.com/v1")
 
response = client.chat.completions.create(
    model="gemini-2.5-pro",
    messages=[
        {"role": "system", "content": "You are a data analysis expert. Output results in Markdown."},
        {"role": "user", "content": "Analyze the sales trends of new energy vehicles in 2023."}
    ],
    temperature=0.7,
    max_tokens=4096
)