Skip to Content
APIOpenAI-CompatibleResponses (Recommended)

Responses API (Recommended)

Create model responses. Supports text and image input, generating text or JSON output. Supports Function Calling, streaming responses, and multi-turn conversations.

Recommended for new projects. This is OpenAI’s next-generation API and offers the following advantages over Chat Completions:

  • Native Prompt Cachinginstructions and input are separated, so system instructions automatically serve as the cache prefix. The unchanged prefix in multi-turn conversations achieves higher cache hit rates, saving up to 50% on input token costs while reducing latency.
  • Structured item model — Cleaner input/output format with native support for tool calling flows.
  • Richer streaming events — Fine-grained SSE event types make real-time UI rendering easier.

Endpoint

POST https://api.hao.ai/v1/responses

Request Parameters

ParameterTypeRequiredDescription
modelstringModel identifier, e.g. openai/gpt-5.4-mini
inputstring | arrayInput content, either a plain text string or a structured message array
instructionsstringSystem instructions (separate from input, automatically benefits from Prompt Caching)
streambooleanWhether to enable SSE streaming response, default false
max_output_tokensnumberMaximum number of tokens to generate
temperaturenumberSampling temperature 0-2, default 1
top_pnumberNucleus sampling parameter
toolsarrayAvailable tool definitions (Function Calling)
tool_choicestring | objectTool selection strategy: auto, none, or a specific tool
truncationstringTruncation strategy: auto to truncate automatically / disabled to error on overflow (default)
textobjectText generation format configuration
storebooleanWhether to store the response (default true)
metadataobjectCustom metadata key-value pairs
providerobjectHaoAI extension: routing and fallback config

Input Format

input supports two formats:

1. Simple string — pass text directly

{ "input": "Hello, please introduce yourself" }

2. Structured message array — multi-turn conversations and multimodal input

interface InputItem { type: 'message' role: 'user' | 'assistant' content: ContentPart[] id?: string // Required for assistant messages status?: 'completed' // Required for assistant messages } type ContentPart = | { type: 'input_text'; text: string } // User text input | { type: 'input_image'; image_url: string } // Image input | { type: 'output_text'; text: string; annotations?: any[] } // Assistant text output

When including assistant role messages in multi-turn conversations, the id and status fields are required. The Responses API is stateless by design — each request must carry the full conversation history.

Request Examples

Terminal
curl https://api.hao.ai/v1/responses \ -H "Authorization: Bearer $HAOAI_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "openai/gpt-5.4-mini", "input": "Explain what an API Gateway is", "instructions": "You are a helpful technical assistant. Respond in English.", "max_output_tokens": 1024 }'

Response Format

{ "id": "resp_abc123", "object": "response", "created_at": 1703123456, "model": "openai/gpt-5.4-mini", "status": "completed", "output": [ { "type": "message", "id": "msg_def456", "status": "completed", "role": "assistant", "content": [ { "type": "output_text", "text": "An API Gateway is a...", "annotations": [] } ] } ], "usage": { "input_tokens": 25, "output_tokens": 150, "total_tokens": 175 } }

Response Field Reference

FieldTypeDescription
idstringUnique response identifier, prefixed with resp_
objectstringFixed value "response"
created_atnumberCreation timestamp (Unix seconds)
modelstringThe model ID actually used
statusstringResponse status: completed, failed, in_progress, cancelled
outputarrayOutput item array, containing messages and tool calls
usageobjectToken usage statistics

Structured Message Input

Use a structured message array for multi-turn conversations:

multi_turn.py
response = client.responses.create( model="openai/gpt-5.4-mini", input=[ { "type": "message", "role": "user", "content": [ {"type": "input_text", "text": "What is the capital of France?"} ] }, { "type": "message", "role": "assistant", "id": "msg_abc123", "status": "completed", "content": [ {"type": "output_text", "text": "The capital of France is Paris.", "annotations": []} ] }, { "type": "message", "role": "user", "content": [ {"type": "input_text", "text": "How many people live there?"} ] } ] ) print(response.output_text)

Streaming

Set stream: true to enable SSE streaming responses:

stream.py
stream = client.responses.create( model="openai/gpt-5.4-mini", input="Tell me a joke about programming", stream=True ) for event in stream: if event.type == "response.output_text.delta": print(event.delta, end="", flush=True)

Streaming Event Types

Streaming responses send the following events via SSE:

data: {"type":"response.created","response":{"id":"resp_abc123","object":"response","status":"in_progress"}} data: {"type":"response.output_item.added","output_index":0,"item":{"type":"message","id":"msg_def456","role":"assistant","status":"in_progress","content":[]}} data: {"type":"response.content_part.added","output_index":0,"content_index":0,"part":{"type":"output_text","text":""}} data: {"type":"response.output_text.delta","output_index":0,"content_index":0,"delta":"Hello"} data: {"type":"response.output_text.delta","output_index":0,"content_index":0,"delta":" there"} data: {"type":"response.output_item.done","output_index":0,"item":{"type":"message","id":"msg_def456","role":"assistant","status":"completed","content":[{"type":"output_text","text":"Hello there..."}]}} data: {"type":"response.completed","response":{"id":"resp_abc123","object":"response","status":"completed","usage":{"input_tokens":12,"output_tokens":45,"total_tokens":57}}} data: [DONE]
Event TypeDescription
response.createdResponse object created
response.output_item.addedNew output item added
response.content_part.addedNew content part added
response.output_text.deltaText delta (token-by-token output)
response.output_item.doneOutput item finished
response.completedResponse fully completed
response.function_call_arguments.deltaFunction call arguments delta
response.function_call_arguments.doneFunction call arguments finished

Function Calling

The Responses API natively supports tool calling:

tools.py
response = client.responses.create( model="openai/gpt-5.4-mini", input="What's the weather like in Beijing today?", tools=[ { "type": "function", "name": "get_weather", "description": "Get the current weather for the specified city", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "City name, e.g. Beijing" }, "unit": { "type": "string", "enum": ["celsius", "fahrenheit"] } }, "required": ["location"] } } ], tool_choice="auto" ) # Handle tool calls for item in response.output: if item.type == "function_call": print(f"Calling function: {item.name}") print(f"Arguments: {item.arguments}")

Tool Call Response Format

When the model calls a tool, output contains a function_call type item:

{ "id": "resp_abc123", "object": "response", "status": "completed", "output": [ { "type": "function_call", "id": "fc_abc123", "call_id": "call_xyz789", "name": "get_weather", "arguments": "{\"location\":\"Beijing\",\"unit\":\"celsius\"}" } ], "usage": { "input_tokens": 45, "output_tokens": 25, "total_tokens": 70 } }

Submitting Tool Results

Send the tool execution result back to the model by including the full call chain in input:

# Second request: submit the tool result response = client.responses.create( model="openai/gpt-5.4-mini", input=[ { "type": "message", "role": "user", "content": [{"type": "input_text", "text": "What's the weather like in Beijing today?"}] }, { "type": "function_call", "id": "fc_abc123", "call_id": "call_xyz789", "name": "get_weather", "arguments": "{\"location\":\"Beijing\",\"unit\":\"celsius\"}" }, { "type": "function_call_output", "id": "fco_abc123", "call_id": "call_xyz789", "output": "{\"temperature\":\"22°C\",\"condition\":\"sunny\"}" } ] ) print(response.output_text) # => "It's sunny in Beijing today, 22°C — perfect for outdoor activities."

Tool Choice Options

ValueDescription
"auto"The model decides whether to call a tool (default)
"none"Disallow tool calls
{"type": "function", "name": "tool_name"}Force a specific tool to be called

Comparison with Chat Completions

FeatureChat CompletionsResponses API
Endpoint/v1/chat/completions/v1/responses
Input formatmessages arrayinput string or structured item array
System instructionsrole: "system" messageinstructions parameter (independently cached)
Prompt CachingSystem instructions are mixed into messages, cache prefix is unstableinstructions passed separately, automatically cached, higher hit rate
Output formatchoices[0].message.contentoutput[0].content[0].text or output_text
Tool callstool_calls inside messageDedicated function_call output item
Tool resultsrole: "tool" messagefunction_call_output input item
Streaming eventschat.completion.chunkStructured event types (response.*)
Token fieldsprompt_tokens / completion_tokensinput_tokens / output_tokens

Both APIs are production-ready. If you already have a Chat Completions integration, no migration is required. We recommend the Responses API for new projects, especially for use cases with complex tool calling flows or high request volumes (where caching can significantly reduce costs). See the Function Calling guide for details.