Streaming
This content is for v0.9.2. Switch to the latest version for up-to-date documentation.
Overview
Section titled “Overview”Streaming lets your application receive partial model output as it is produced. Use it for chat UIs, long-running responses, progress reporting, and command-line tools that should show output immediately instead of waiting for the final message.
Most LLM providers support token-by-token streaming. Deterministic providers such as CLIPS and Z3 may emit a single result chunk or a small number of status updates, depending on the operation.
When to Use Streaming
Section titled “When to Use Streaming”| Use streaming when | Use a regular call when |
|---|---|
| Users should see output immediately | You need one complete JSON response |
| Responses may be long | Responses are small and predictable |
| You want progress or partial results | You need simpler error handling |
| You are building CLI or chat interfaces | You are running batch jobs |
use futures::StreamExt;use nxuskit::{completion_stream, NxuskitError};
#[tokio::main]async fn main() -> Result<(), NxuskitError> { let mut stream = completion_stream("gpt-4o", "Count from one to five.").await?;
while let Some(chunk) = stream.next().await { print!("{}", chunk?); }
Ok(())}nxuskit-cli chat \ --provider openai \ --model gpt-4o \ --stream \ "Explain streaming responses in one paragraph."For machine-readable output, use the Level 1 call command with JSONL:
echo '{"provider":"openai","model":"gpt-4o","prompt":"Count to five."}' \ | nxuskit-cli call --input - --format jsonl --streamStream Events
Section titled “Stream Events”Streaming APIs generally emit:
- Content chunks — Partial text or structured deltas.
- Metadata — Provider, model, token usage, timing, or trace information when available.
- Completion — Final status, finish reason, or terminal error.
The exact fields vary by provider. Code that handles multiple providers should append content chunks as they arrive and treat metadata as optional.
Error Handling
Section titled “Error Handling”Handle errors in the stream loop, not only at stream creation time. A provider can accept a request and still fail later because of rate limits, network interruptions, token limits, or a cancelled operation.