Curated Ollama Models
This note records local model passes used for the common-sense guardrails walkthrough. The first pass on 2026-05-07 combined ollama-cache list SSD residency with ollama list installed sizes and filtered to models under 8 GB. Follow-up passes on 2026-05-11 and 2026-05-12 added the car-wash fail/recover smoke shape and structured-output checks.
These are dated smoke-test starting points, not model rankings or product guarantees.
Source Notes
Section titled “Source Notes”- Live structured fact extraction prefers pure JSON. If the model wraps a valid JSON object in prose, the Python and Bash runners extract it and mark the structured-facts stage as
warn. If no valid JSON object is recoverable after retry, the stage is markedfailand the run falls back to checked-in fact fixtures so later guardrail stages can still demonstrate their behavior. - Ollama structured outputs are most reliable when the API
formatfield carries JSON or a JSON schema, and Ollama recommends grounding the prompt with the schema and using low temperature. Provider-native structured-output controls are still the robust path for future hardening, but this walkthrough no longer treats structured-facts failure as the expected result. - Local car-wash smokes found
qwen3.5:4bto be the best primary walkthrough candidate: small enough for local proof, stronger than the 2B option, and tested against the desired naivewalkfailure plus enhanceddriverecovery. - Local car-wash smokes found
qwen3.5:2buseful as the low-resource option with the same fail/recover shape. - Small-model smokes also found
gemma3:1banderukude/omni-json:1buseful for very small demos.nemotron-3-nano:4bis useful as a comparison point because separate tool-intent smokes showed native strict behavior. llama3.2remains a historical target candidate from earlier sweeps, but it is no longer the default recommendation for this walkthrough.
References:
- https://docs.ollama.com/capabilities/structured-outputs
- https://ollama.com/library/llama3.2
- https://ollama.com/library/gemma3
- DevOps Ollama model-testing notes from 2026-05-09 through 2026-05-12
Recommended Walkthrough Models
Section titled “Recommended Walkthrough Models”Use these first because they are small local smoke-test candidates that match the guardrail demo shape:
| Role | Model | Installed size | Why it is on the list | Walkthrough note |
|---|---|---|---|---|
| Primary live walkthrough | qwen3.5:4b | 3.4 GB | Stronger small Qwen 3.5 option from the 2026-05-11/12 local smokes. | Desired car-wash shape: naive answer fails as walk, constrained output is parseable, and enhanced object-presence prompting recovers to drive. Use this first when available. |
| Low-resource walkthrough | qwen3.5:2b | 2.7 GB | Smaller Qwen 3.5 option from the 2026-05-12 local smoke. | Same fail/recover car-wash shape at a smaller footprint. Use when local resource constraints matter more than maximum tool-intent strength. |
| Very small demo candidates | gemma3:1b or erukude/omni-json:1b | 815 MB / 1.4 GB | Very small models that reproduced the demo failure and recovery shape in local smokes. | Useful for constrained machines, but keep the primary docs and walkthrough centered on qwen3.5:4b. |
| Comparison candidate | nemotron-3-nano:4b | 2.8 GB | Car-wash fail/recover target plus separate native strict tool-call smoke evidence. | Interesting for comparison, but adding it to the main walkthrough can dilute the guardrails story. |
Other observed models:
| Model | Installed size | Use | Walkthrough note |
|---|---|---|---|
llama3.2 | 2.0 GB | Historical target candidate. | Earlier local sweeps showed a valid fail/recover shape, but newer release-surface guidance prefers qwen3.5:4b primary and qwen3.5:2b low-resource. |
phi4-mini-reasoning:3.8b | 3.2 GB | Avoid as default for this scenario. | Answered drive on the naive prompt, which removes the intended baseline failure. |
granite4:350m-h | 366 MB | Avoid as default for this scenario. | Failed to recover under the enhanced prompt in local smokes. |
qwen3:4b | 2.5 GB | Historical extraction experiment. | Direct JSON probes were useful, but newer Qwen 3.5 smokes are the better starting point for the full walkthrough. |
Current Walkthrough Default
Section titled “Current Walkthrough Default”Prefer qwen3.5:4b for the interactive walkthrough when it is available. It is the primary v1.0.x local proof candidate, remains small enough for a developer laptop, and has current local smoke evidence for the car-wash fail/recover shape.
export NXUSKIT_PROVIDER=ollamaexport NXUSKIT_MODEL=qwen3.5:4bexport OLLAMA_HOST=http://127.0.0.1:11434For a lower-resource run, use the 2B Qwen 3.5 variant:
export NXUSKIT_PROVIDER=ollamaexport NXUSKIT_MODEL=qwen3.5:2bexport OLLAMA_HOST=http://127.0.0.1:11434For phase-specific experiments, keep the stronger model on extraction and repair while trying a smaller baseline:
export NXUSKIT_PROVIDER=ollamaexport NXUSKIT_MODEL=qwen3.5:2bexport NXUSKIT_COMMON_SENSE_FACTS_MODEL=qwen3.5:4bexport NXUSKIT_COMMON_SENSE_REPAIR_MODEL=qwen3.5:4bexport OLLAMA_HOST=http://127.0.0.1:11434Structured Facts Posture
Section titled “Structured Facts Posture”Provider-native structured-output controls remain the preferred hardening path, especially Ollama JSON/schema formatting and thinking-mode controls when exposed through the installed SDK/CLI surface. In the v1.0.x examples line, the Bash/CLI runner disables thinking for short guardrail calls and requests JSON schema output for fact extraction. The example itself should be described more narrowly: live structured facts can pass with pure JSON, warn when valid JSON is recovered from prose, or fail after retry and fall back to fixtures. Failure is a fallback state, not the expected walkthrough result.