Vision

Send images alongside text prompts for multimodal analysis

Send images alongside text to any LLM provider using one consistent API that handles encoding, formatting, and provider differences for you.

Edition

Community — runs on the OSS / Community SDK edition.

Difficulty: Intermediate 🟦 · LLM · Vision

Summary: Vision and multimodal capabilities with images
Scenario: Send images alongside text prompts for multimodal analysis
tech_tags in manifest: LLM, Vision — example id vision in conformance/examples_manifest.json.

SDK: Use an installed SDK tree (NXUSKIT_SDK_DIR, NXUSKIT_LIB_PATH as needed); test-examples.sh resolves Go/Rust/Python deps from that tree only — see README.md, scripts/setup-sdk.sh, and scripts/test-examples.sh.
Languages in this example: go, python, rust, bash (paths under this directory; Python may live under a sibling python/ or shared reference per Language Implementations).
Models: Set cloud provider API keys for live Claude/OpenAI calls, or run metadata-only CLI/Bash mode with VISION_RUN_LIVE=0. Ollama vision models can be selected when the local provider path is available.

Image captioning, visual QA, document understanding

LLM, Vision

Attach an installed SDK (NXUSKIT_SDK_DIR). See the repository README.md and scripts/test-examples.sh.

# From `/examples/patterns/vision`:
cd rust && cargo build
cd go && make build
cd python && python3 main.py --help
cd bash && make build

cd rust
cargo run

cd go
make build && bin/vision

cd python
python main.py

cd bash
make run
VISION_RUN_LIVE=0 make run
make run ARGS="openai"

# Rust
cd rust && cargo test

# Go
cd go && go test -v

# Python smoke
cd python && python3 main.py --help

# CLI/Bash
cd bash && make test