Responses API (Recommended)

Create model responses. Supports text and image input, generating text or JSON output. Supports Function Calling, streaming responses, and multi-turn conversations.

Recommended for new projects. This is OpenAI’s next-generation API and offers the following advantages over Chat Completions:

Native Prompt Caching — instructions and input are separated, so system instructions automatically serve as the cache prefix. The unchanged prefix in multi-turn conversations achieves higher cache hit rates, saving up to 50% on input token costs while reducing latency.
Structured item model — Cleaner input/output format with native support for tool calling flows.
Richer streaming events — Fine-grained SSE event types make real-time UI rendering easier.

Endpoint


POST https://api.hao.ai/v1/responses

Request Parameters

Parameter	Type	Required	Description
`model`	string	✅	Model identifier, e.g. `openai/gpt-5.4-mini`
`input`	string \| array	✅	Input content, either a plain text string or a structured message array
`instructions`	string	—	System instructions (separate from `input`, automatically benefits from Prompt Caching)
`stream`	boolean	—	Whether to enable SSE streaming response, default `false`
`max_output_tokens`	number	—	Maximum number of tokens to generate
`temperature`	number	—	Sampling temperature 0-2, default 1
`top_p`	number	—	Nucleus sampling parameter
`tools`	array	—	Available tool definitions (Function Calling)
`tool_choice`	string \| object	—	Tool selection strategy: `auto`, `none`, or a specific tool
`truncation`	string	—	Truncation strategy: `auto` to truncate automatically / `disabled` to error on overflow (default)
`text`	object	—	Text generation format configuration
`store`	boolean	—	Whether to store the response (default `true`)
`metadata`	object	—	Custom metadata key-value pairs
`provider`	object	—	HaoAI extension: routing and fallback config

Input Format

input supports two formats:

1. Simple string — pass text directly


{
  "input": "Hello, please introduce yourself"
}

2. Structured message array — multi-turn conversations and multimodal input


interface InputItem {
  type: 'message'
  role: 'user' | 'assistant'
  content: ContentPart[]
  id?: string               // Required for assistant messages
  status?: 'completed'      // Required for assistant messages
}

type ContentPart =
  | { type: 'input_text'; text: string }           // User text input
  | { type: 'input_image'; image_url: string }     // Image input
  | { type: 'output_text'; text: string; annotations?: any[] }  // Assistant text output

When including assistant role messages in multi-turn conversations, the id and status fields are required. The Responses API is stateless by design — each request must carry the full conversation history.

Request Examples

cURL

Terminal


curl https://api.hao.ai/v1/responses \
  -H "Authorization: Bearer $HAOAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5.4-mini",
    "input": "Explain what an API Gateway is",
    "instructions": "You are a helpful technical assistant. Respond in English.",
    "max_output_tokens": 1024
  }'

Python

responses.py


from openai import OpenAI
 
client = OpenAI(
    base_url="https://api.hao.ai/v1",
    api_key="<your HAOAI_API_KEY>"
)
 
response = client.responses.create(
    model="openai/gpt-5.4-mini",
    input="Explain what an API Gateway is",
    instructions="You are a helpful technical assistant. Respond in English.",
    max_output_tokens=1024
)
 
print(response.output_text)

TypeScript

responses.ts


import OpenAI from 'openai'
 
const client = new OpenAI({
  baseURL: 'https://api.hao.ai/v1',
  apiKey: '<your HAOAI_API_KEY>'
})
 
const response = await client.responses.create({
  model: 'openai/gpt-5.4-mini',
  input: 'Explain what an API Gateway is',
  instructions: 'You are a helpful technical assistant. Respond in English.',
  max_output_tokens: 1024
})
 
console.log(response.output_text)

Response Format


{
  "id": "resp_abc123",
  "object": "response",
  "created_at": 1703123456,
  "model": "openai/gpt-5.4-mini",
  "status": "completed",
  "output": [
    {
      "type": "message",
      "id": "msg_def456",
      "status": "completed",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "An API Gateway is a...",
          "annotations": []
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 25,
    "output_tokens": 150,
    "total_tokens": 175
  }
}

Response Field Reference

Field	Type	Description
`id`	string	Unique response identifier, prefixed with `resp_`
`object`	string	Fixed value `"response"`
`created_at`	number	Creation timestamp (Unix seconds)
`model`	string	The model ID actually used
`status`	string	Response status: `completed`, `failed`, `in_progress`, `cancelled`
`output`	array	Output item array, containing messages and tool calls
`usage`	object	Token usage statistics

Structured Message Input

Use a structured message array for multi-turn conversations:

Python

multi_turn.py


response = client.responses.create(
    model="openai/gpt-5.4-mini",
    input=[
        {
            "type": "message",
            "role": "user",
            "content": [
                {"type": "input_text", "text": "What is the capital of France?"}
            ]
        },
        {
            "type": "message",
            "role": "assistant",
            "id": "msg_abc123",
            "status": "completed",
            "content": [
                {"type": "output_text", "text": "The capital of France is Paris.", "annotations": []}
            ]
        },
        {
            "type": "message",
            "role": "user",
            "content": [
                {"type": "input_text", "text": "How many people live there?"}
            ]
        }
    ]
)
 
print(response.output_text)

Streaming

Set stream: true to enable SSE streaming responses:

Python

stream.py


stream = client.responses.create(
    model="openai/gpt-5.4-mini",
    input="Tell me a joke about programming",
    stream=True
)
 
for event in stream:
    if event.type == "response.output_text.delta":
        print(event.delta, end="", flush=True)

Streaming Event Types

Streaming responses send the following events via SSE:


data: {"type":"response.created","response":{"id":"resp_abc123","object":"response","status":"in_progress"}}
 
data: {"type":"response.output_item.added","output_index":0,"item":{"type":"message","id":"msg_def456","role":"assistant","status":"in_progress","content":[]}}
 
data: {"type":"response.content_part.added","output_index":0,"content_index":0,"part":{"type":"output_text","text":""}}
 
data: {"type":"response.output_text.delta","output_index":0,"content_index":0,"delta":"Hello"}
 
data: {"type":"response.output_text.delta","output_index":0,"content_index":0,"delta":" there"}
 
data: {"type":"response.output_item.done","output_index":0,"item":{"type":"message","id":"msg_def456","role":"assistant","status":"completed","content":[{"type":"output_text","text":"Hello there..."}]}}
 
data: {"type":"response.completed","response":{"id":"resp_abc123","object":"response","status":"completed","usage":{"input_tokens":12,"output_tokens":45,"total_tokens":57}}}
 
data: [DONE]

Event Type	Description
`response.created`	Response object created
`response.output_item.added`	New output item added
`response.content_part.added`	New content part added
`response.output_text.delta`	Text delta (token-by-token output)
`response.output_item.done`	Output item finished
`response.completed`	Response fully completed
`response.function_call_arguments.delta`	Function call arguments delta
`response.function_call_arguments.done`	Function call arguments finished

Function Calling

The Responses API natively supports tool calling:

Python

tools.py


response = client.responses.create(
    model="openai/gpt-5.4-mini",
    input="What's the weather like in Beijing today?",
    tools=[
        {
            "type": "function",
            "name": "get_weather",
            "description": "Get the current weather for the specified city",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name, e.g. Beijing"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"]
                    }
                },
                "required": ["location"]
            }
        }
    ],
    tool_choice="auto"
)
 
# Handle tool calls
for item in response.output:
    if item.type == "function_call":
        print(f"Calling function: {item.name}")
        print(f"Arguments: {item.arguments}")

Tool Call Response Format

When the model calls a tool, output contains a function_call type item:


{
  "id": "resp_abc123",
  "object": "response",
  "status": "completed",
  "output": [
    {
      "type": "function_call",
      "id": "fc_abc123",
      "call_id": "call_xyz789",
      "name": "get_weather",
      "arguments": "{\"location\":\"Beijing\",\"unit\":\"celsius\"}"
    }
  ],
  "usage": {
    "input_tokens": 45,
    "output_tokens": 25,
    "total_tokens": 70
  }
}

Submitting Tool Results

Send the tool execution result back to the model by including the full call chain in input:


# Second request: submit the tool result
response = client.responses.create(
    model="openai/gpt-5.4-mini",
    input=[
        {
            "type": "message",
            "role": "user",
            "content": [{"type": "input_text", "text": "What's the weather like in Beijing today?"}]
        },
        {
            "type": "function_call",
            "id": "fc_abc123",
            "call_id": "call_xyz789",
            "name": "get_weather",
            "arguments": "{\"location\":\"Beijing\",\"unit\":\"celsius\"}"
        },
        {
            "type": "function_call_output",
            "id": "fco_abc123",
            "call_id": "call_xyz789",
            "output": "{\"temperature\":\"22°C\",\"condition\":\"sunny\"}"
        }
    ]
)

print(response.output_text)
# => "It's sunny in Beijing today, 22°C — perfect for outdoor activities."

Tool Choice Options

Value	Description
`"auto"`	The model decides whether to call a tool (default)
`"none"`	Disallow tool calls
`{"type": "function", "name": "tool_name"}`	Force a specific tool to be called

Comparison with Chat Completions

Feature	Chat Completions	Responses API
Endpoint	`/v1/chat/completions`	`/v1/responses`
Input format	`messages` array	`input` string or structured item array
System instructions	`role: "system"` message	`instructions` parameter (independently cached)
Prompt Caching	System instructions are mixed into messages, cache prefix is unstable	`instructions` passed separately, automatically cached, higher hit rate
Output format	`choices[0].message.content`	`output[0].content[0].text` or `output_text`
Tool calls	`tool_calls` inside message	Dedicated `function_call` output item
Tool results	`role: "tool"` message	`function_call_output` input item
Streaming events	`chat.completion.chunk`	Structured event types (`response.*`)
Token fields	`prompt_tokens` / `completion_tokens`	`input_tokens` / `output_tokens`

Both APIs are production-ready. If you already have a Chat Completions integration, no migration is required. We recommend the Responses API for new projects, especially for use cases with complex tool calling flows or high request volumes (where caching can significantly reduce costs). See the Function Calling guide for details.