Skip to content

Streaming

Streaming lets your application receive partial model output as it is produced. Use it for chat UIs, long-running responses, progress reporting, and command-line tools that should show output immediately instead of waiting for the final message.

Most LLM providers support token-by-token streaming. Deterministic providers such as CLIPS and Z3 may emit a single result chunk or a small number of status updates, depending on the operation.

Use streaming whenUse a regular call when
Users should see output immediatelyYou need one complete JSON response
Responses may be longResponses are small and predictable
You want progress or partial resultsYou need simpler error handling
You are building CLI or chat interfacesYou are running batch jobs
use futures::StreamExt;
use nxuskit::{completion_stream, NxuskitError};
#[tokio::main]
async fn main() -> Result<(), NxuskitError> {
let mut stream = completion_stream("gpt-4o", "Count from one to five.").await?;
while let Some(chunk) = stream.next().await {
print!("{}", chunk?);
}
Ok(())
}
Terminal window
nxuskit-cli chat \
--provider openai \
--model gpt-4o \
--stream \
"Explain streaming responses in one paragraph."

For machine-readable output, use the Level 1 call command with JSONL:

Terminal window
echo '{"provider":"openai","model":"gpt-4o","prompt":"Count to five."}' \
| nxuskit-cli call --input - --format jsonl --stream

Streaming APIs generally emit:

  • Content chunks — Partial text or structured deltas.
  • Metadata — Provider, model, token usage, timing, or trace information when available.
  • Completion — Final status, finish reason, or terminal error.

The exact fields vary by provider. Code that handles multiple providers should append content chunks as they arrive and treat metadata as optional.

Handle errors in the stream loop, not only at stream creation time. A provider can accept a request and still fail later because of rate limits, network interruptions, token limits, or a cancelled operation.