Responses API (Recommended)
Create model responses. Supports text and image input, generating text or JSON output. Supports Function Calling, streaming responses, and multi-turn conversations.
Recommended for new projects. This is OpenAI’s next-generation API and offers the following advantages over Chat Completions:
- Native Prompt Caching —
instructionsandinputare separated, so system instructions automatically serve as the cache prefix. The unchanged prefix in multi-turn conversations achieves higher cache hit rates, saving up to 50% on input token costs while reducing latency. - Structured item model — Cleaner input/output format with native support for tool calling flows.
- Richer streaming events — Fine-grained SSE event types make real-time UI rendering easier.
Endpoint
POST https://api.hao.ai/v1/responsesRequest Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | ✅ | Model identifier, e.g. openai/gpt-5.4-mini |
input | string | array | ✅ | Input content, either a plain text string or a structured message array |
instructions | string | — | System instructions (separate from input, automatically benefits from Prompt Caching) |
stream | boolean | — | Whether to enable SSE streaming response, default false |
max_output_tokens | number | — | Maximum number of tokens to generate |
temperature | number | — | Sampling temperature 0-2, default 1 |
top_p | number | — | Nucleus sampling parameter |
tools | array | — | Available tool definitions (Function Calling) |
tool_choice | string | object | — | Tool selection strategy: auto, none, or a specific tool |
truncation | string | — | Truncation strategy: auto to truncate automatically / disabled to error on overflow (default) |
text | object | — | Text generation format configuration |
store | boolean | — | Whether to store the response (default true) |
metadata | object | — | Custom metadata key-value pairs |
provider | object | — | HaoAI extension: routing and fallback config |
Input Format
input supports two formats:
1. Simple string — pass text directly
{
"input": "Hello, please introduce yourself"
}2. Structured message array — multi-turn conversations and multimodal input
interface InputItem {
type: 'message'
role: 'user' | 'assistant'
content: ContentPart[]
id?: string // Required for assistant messages
status?: 'completed' // Required for assistant messages
}
type ContentPart =
| { type: 'input_text'; text: string } // User text input
| { type: 'input_image'; image_url: string } // Image input
| { type: 'output_text'; text: string; annotations?: any[] } // Assistant text outputWhen including assistant role messages in multi-turn conversations, the id and status fields are required. The Responses API is stateless by design — each request must carry the full conversation history.
Request Examples
cURL
curl https://api.hao.ai/v1/responses \
-H "Authorization: Bearer $HAOAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-5.4-mini",
"input": "Explain what an API Gateway is",
"instructions": "You are a helpful technical assistant. Respond in English.",
"max_output_tokens": 1024
}'Response Format
{
"id": "resp_abc123",
"object": "response",
"created_at": 1703123456,
"model": "openai/gpt-5.4-mini",
"status": "completed",
"output": [
{
"type": "message",
"id": "msg_def456",
"status": "completed",
"role": "assistant",
"content": [
{
"type": "output_text",
"text": "An API Gateway is a...",
"annotations": []
}
]
}
],
"usage": {
"input_tokens": 25,
"output_tokens": 150,
"total_tokens": 175
}
}Response Field Reference
| Field | Type | Description |
|---|---|---|
id | string | Unique response identifier, prefixed with resp_ |
object | string | Fixed value "response" |
created_at | number | Creation timestamp (Unix seconds) |
model | string | The model ID actually used |
status | string | Response status: completed, failed, in_progress, cancelled |
output | array | Output item array, containing messages and tool calls |
usage | object | Token usage statistics |
Structured Message Input
Use a structured message array for multi-turn conversations:
Python
response = client.responses.create(
model="openai/gpt-5.4-mini",
input=[
{
"type": "message",
"role": "user",
"content": [
{"type": "input_text", "text": "What is the capital of France?"}
]
},
{
"type": "message",
"role": "assistant",
"id": "msg_abc123",
"status": "completed",
"content": [
{"type": "output_text", "text": "The capital of France is Paris.", "annotations": []}
]
},
{
"type": "message",
"role": "user",
"content": [
{"type": "input_text", "text": "How many people live there?"}
]
}
]
)
print(response.output_text)Streaming
Set stream: true to enable SSE streaming responses:
Python
stream = client.responses.create(
model="openai/gpt-5.4-mini",
input="Tell me a joke about programming",
stream=True
)
for event in stream:
if event.type == "response.output_text.delta":
print(event.delta, end="", flush=True)Streaming Event Types
Streaming responses send the following events via SSE:
data: {"type":"response.created","response":{"id":"resp_abc123","object":"response","status":"in_progress"}}
data: {"type":"response.output_item.added","output_index":0,"item":{"type":"message","id":"msg_def456","role":"assistant","status":"in_progress","content":[]}}
data: {"type":"response.content_part.added","output_index":0,"content_index":0,"part":{"type":"output_text","text":""}}
data: {"type":"response.output_text.delta","output_index":0,"content_index":0,"delta":"Hello"}
data: {"type":"response.output_text.delta","output_index":0,"content_index":0,"delta":" there"}
data: {"type":"response.output_item.done","output_index":0,"item":{"type":"message","id":"msg_def456","role":"assistant","status":"completed","content":[{"type":"output_text","text":"Hello there..."}]}}
data: {"type":"response.completed","response":{"id":"resp_abc123","object":"response","status":"completed","usage":{"input_tokens":12,"output_tokens":45,"total_tokens":57}}}
data: [DONE]| Event Type | Description |
|---|---|
response.created | Response object created |
response.output_item.added | New output item added |
response.content_part.added | New content part added |
response.output_text.delta | Text delta (token-by-token output) |
response.output_item.done | Output item finished |
response.completed | Response fully completed |
response.function_call_arguments.delta | Function call arguments delta |
response.function_call_arguments.done | Function call arguments finished |
Function Calling
The Responses API natively supports tool calling:
Python
response = client.responses.create(
model="openai/gpt-5.4-mini",
input="What's the weather like in Beijing today?",
tools=[
{
"type": "function",
"name": "get_weather",
"description": "Get the current weather for the specified city",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name, e.g. Beijing"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"]
}
},
"required": ["location"]
}
}
],
tool_choice="auto"
)
# Handle tool calls
for item in response.output:
if item.type == "function_call":
print(f"Calling function: {item.name}")
print(f"Arguments: {item.arguments}")Tool Call Response Format
When the model calls a tool, output contains a function_call type item:
{
"id": "resp_abc123",
"object": "response",
"status": "completed",
"output": [
{
"type": "function_call",
"id": "fc_abc123",
"call_id": "call_xyz789",
"name": "get_weather",
"arguments": "{\"location\":\"Beijing\",\"unit\":\"celsius\"}"
}
],
"usage": {
"input_tokens": 45,
"output_tokens": 25,
"total_tokens": 70
}
}Submitting Tool Results
Send the tool execution result back to the model by including the full call chain in input:
# Second request: submit the tool result
response = client.responses.create(
model="openai/gpt-5.4-mini",
input=[
{
"type": "message",
"role": "user",
"content": [{"type": "input_text", "text": "What's the weather like in Beijing today?"}]
},
{
"type": "function_call",
"id": "fc_abc123",
"call_id": "call_xyz789",
"name": "get_weather",
"arguments": "{\"location\":\"Beijing\",\"unit\":\"celsius\"}"
},
{
"type": "function_call_output",
"id": "fco_abc123",
"call_id": "call_xyz789",
"output": "{\"temperature\":\"22°C\",\"condition\":\"sunny\"}"
}
]
)
print(response.output_text)
# => "It's sunny in Beijing today, 22°C — perfect for outdoor activities."Tool Choice Options
| Value | Description |
|---|---|
"auto" | The model decides whether to call a tool (default) |
"none" | Disallow tool calls |
{"type": "function", "name": "tool_name"} | Force a specific tool to be called |
Comparison with Chat Completions
| Feature | Chat Completions | Responses API |
|---|---|---|
| Endpoint | /v1/chat/completions | /v1/responses |
| Input format | messages array | input string or structured item array |
| System instructions | role: "system" message | instructions parameter (independently cached) |
| Prompt Caching | System instructions are mixed into messages, cache prefix is unstable | instructions passed separately, automatically cached, higher hit rate |
| Output format | choices[0].message.content | output[0].content[0].text or output_text |
| Tool calls | tool_calls inside message | Dedicated function_call output item |
| Tool results | role: "tool" message | function_call_output input item |
| Streaming events | chat.completion.chunk | Structured event types (response.*) |
| Token fields | prompt_tokens / completion_tokens | input_tokens / output_tokens |
Both APIs are production-ready. If you already have a Chat Completions integration, no migration is required. We recommend the Responses API for new projects, especially for use cases with complex tool calling flows or high request volumes (where caching can significantly reduce costs). See the Function Calling guide for details.