Streaming

Streaming uses standard server-sent events and should be your default for interactive user experiences.

Set stream=true to receive token chunks as they are generated. Streaming dramatically improves perceived responsiveness because users see partial output before full completion.

Each chunk arrives as an SSE event line formatted as data: <json> and ends with an empty line delimiter. The stream terminates with data: [DONE].

SSE Example

sse
data: {"id":"chatcmpl-1","object":"chat.completion.chunk","created":1760000000,"model":"openai/gpt-5-mini","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-1","object":"chat.completion.chunk","created":1760000000,"model":"openai/gpt-5-mini","choices":[{"index":0,"delta":{"content":"Hel"},"finish_reason":null}]}

data: {"id":"chatcmpl-1","object":"chat.completion.chunk","created":1760000000,"model":"openai/gpt-5-mini","choices":[{"index":0,"delta":{"content":"lo"},"finish_reason":null}]}

data: {"id":"chatcmpl-1","object":"chat.completion.chunk","created":1760000000,"model":"openai/gpt-5-mini","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":18,"completion_tokens":9,"total_tokens":27}}

data: [DONE]

Client Implementation Notes

Apply chunks incrementally in UI state instead of buffering the full response. On disconnect, show partial content and expose retry actions to users. If you retry, preserve idempotency controls on your side so repeated prompts do not create duplicate side effects.