Skip to content

Vision

Send images alongside text prompts for multimodal analysis

Send images alongside text to any LLM provider using one consistent API that handles encoding, formatting, and provider differences for you.

Community — runs on the OSS / Community SDK edition.

Difficulty: Intermediate 🟦 · LLM · Vision

  • Summary: Vision and multimodal capabilities with images
  • Scenario: Send images alongside text prompts for multimodal analysis
  • tech_tags in manifest: LLM, Vision — example id vision in conformance/examples_manifest.json.
  • SDK: Use an installed SDK tree (NXUSKIT_SDK_DIR, NXUSKIT_LIB_PATH as needed); test-examples.sh resolves Go/Rust/Python deps from that tree only — see README.md, scripts/setup-sdk.sh, and scripts/test-examples.sh.
  • Languages in this example: go, python, rust, bash (paths under this directory; Python may live under a sibling python/ or shared reference per Language Implementations).
  • Models: Set cloud provider API keys for live Claude/OpenAI calls, or run metadata-only CLI/Bash mode with VISION_RUN_LIVE=0. Ollama vision models can be selected when the local provider path is available.

Image captioning, visual QA, document understanding

LLM, Vision

LanguagePathStatus
Rustrust/Available
Gogo/Available
Pythonpython/Available
CLI/Bashbash/Available

Attach an installed SDK (NXUSKIT_SDK_DIR). See the repository README.md and scripts/test-examples.sh.

Terminal window
# From `/examples/patterns/vision`:
cd rust && cargo build
cd go && make build
cd python && python3 main.py --help
cd bash && make build
Terminal window
cd rust
cargo run
Terminal window
cd go
make build && bin/vision
Terminal window
cd python
python main.py
Terminal window
cd bash
make run
VISION_RUN_LIVE=0 make run
make run ARGS="openai"
Terminal window
# Rust
cd rust && cargo test
# Go
cd go && go test -v
# Python smoke
cd python && python3 main.py --help
# CLI/Bash
cd bash && make test